DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Practice Questions
181 real DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam questions with expert-verified answers and explanations. Page 1 of 4.
- Question #1Manipulating Data with Spark DataFrames
Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000?
Spark DataFrame APIFiltering DataFramesPySpark SyntaxColumn Expressions - Question #2Perform DataFrame Transformations
Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 OR the value...
DataFrame FilteringPySpark SQLColumn ExpressionsLogical Operators - Question #3Working with Spark DataFrames and Transformations
Which of the following code blocks returns a new DataFrame from DataFrame storesDF where column storeId is of the type string?
DataFrame TransformationsType CastingPySpark APIwithColumn - Question #4Data Manipulation with Spark DataFrames
Which of the following code blocks returns a new DataFrame with a new column employeesPerSqft that is the quotient of column numberOfEmployees and column sqft, both of which are fr...
DataFrame TransformationsAdding ColumnsSpark SQL FunctionsColumn Operations - Question #5Performing DataFrame Transformations
The code block shown below should return a new DataFrame from DataFrame storesDF where column modality is the constant string "PHYSICAL", Assume DataFrame storesDF is the only defi...
Spark DataFramesDataFrame TransformationswithColumnLiteral Functions - Question #6Perform DataFrame Column Transformations
Which of the following code blocks returns a DataFrame where column storeCategory from DataFrame storesDF is split at the underscore character into column storeValueCategory and co...
DataFrame TransformationsString FunctionsColumn OperationsPySpark - Question #7Transforming DataFrames
Which of the following code blocks returns a new DataFrame where column productCategories only has one word per row, resulting in a DataFrame with many more rows than DataFrame sto...
Spark DataFrame APIexplode functionwithColumnData Transformation - Question #8Transforming Data with DataFrames
Which of the following code blocks returns a new DataFrame with column storeDescription where the pattern "Description: " has been removed from the beginning of column storeDescrip...
Spark DataFrame APIColumn operationsString manipulationRegular expressions - Question #9Transforming DataFrames
Which of the following code blocks returns a new DataFrame where column division from DataFrame storesDF has been replaced and renamed to column state and column managerName from D...
Spark DataFrame APIColumn OperationsDataFrame TransformationsRenaming Columns - Question #10Data Transformation with Spark DataFrames
The code block shown contains an error. The code block is intended to return a new DataFrame where column sqft from DataFrame storesDF has had its missing values replaced with the...
PySparkDataFrame OperationsMissing Data Handlingna.fill - Question #11DataFrame Transformations
Which of the following operations fails to return a DataFrame with no duplicate rows?
PySpark DataFrame APIData DeduplicationDataFrame TransformationsMethod Parameters - Question #12Efficiently use Spark SQL and DataFrame API for data transformations and analysis.
Which of the following code blocks will most quickly return an approximation for the number of distinct values in column division in DataFrame storesDF?
Spark SQL FunctionsDataFrame AggregationsPerformance OptimizationDistinct Count Approximation - Question #13Data Transformation and Aggregation
The code block shown below contains an error. The code block is intended to return a new DataFrame with the mean of column sqft from DataFrame storesDF in column sqftMean. Identify...
Spark DataFrame APIAggregation FunctionsData TransformationError Identification - Question #14Working with Spark DataFrames
Which of the following operations can be used to return the number of rows in a DataFrame?
Spark DataFrameDataFrame operationsCount rowsAPI methods - Question #15Spark DataFrame Transformations
Which of the following operations returns a GroupedData object?
Spark DataFrameGrouping OperationsgroupBy() methodData Aggregation - Question #16Performing Basic DataFrame Operations
Which of the following code blocks returns a collection of summary statistics for all columns in DataFrame storesDF?
PySpark DataFrameDescriptive StatisticsData ExplorationDataFrame API - Question #17Perform DataFrame transformations
Which of the following code blocks fails to return a DataFrame reverse sorted alphabetically based on column division?
Spark DataFramesDataFrame SortingorderBy and sortAscending and Descending - Question #18Performing DataFrame Transformations
Which of the following code blocks returns a 15 percent sample of rows from DataFrame storesDF without replacement?
DataFrame samplingSpark DataFrame APIData transformations - Question #19Performing Spark DataFrame Actions
Which of the following code blocks returns all the rows from DataFrame storesDF?
Spark DataFrameDataFrame ActionsData RetrievalPySpark API - Question #20Working with Spark DataFrames
Which of the following code blocks applies the function assessPerformance() to each row of DataFrame storesDF?
Spark DataFramesDataFrame ActionsPython List ComprehensionsRow-wise Operations - Question #21Spark DataFrame Operations
The code block shown below contains an error. The code block is intended to print the schema of DataFrame storesDF. Identify the error. Code block: storesDF.printSchema
Spark DataFrameSchema OperationsAPI SyntaxMethod Invocation - Question #22User-Defined Functions (UDFs)
The code block shown below should create and register a SQL UDF named "ASSESS_PERFORMANCE" using the Python function assessPerformance() and apply it to column customerSatisfaction...
PySpark UDFsSpark SQLUDF RegistrationUser-Defined Functions - Question #23Implementing User-Defined Functions (UDFs) in PySpark
The code block shown below contains an error. The code block is intended to create a Python UDF assessPerformanceUDF() using the integer-returning Python function assessPerformance...
PySpark UDFsDataFrame APISpark Data TypesError Identification - Question #24Working with DataFrames and Spark SQL
The code block shown below contains an error. The code block is intended to use SQL to return a new DataFrame containing column storeId and column managerName from a table created...
Spark SQLDataFrame APISparkSessionTemporary Views - Question #25Working with Spark DataFrames
The code block shown below should create a single-column DataFrame from Python list years which is made up of integers. Choose the response that correctly fills in the numbered bla...
PySparkDataFrame creationSparkSessionSchema definition - Question #26Spark DataFrame Persistence
The code block shown below contains an error. The code block is intended to cache DataFrame storesDF only in Spark's memory and then return the number of rows in the cached DataFra...
Spark CachingStorage LevelsDataFrame PersistenceSpark API - Question #27Spark DataFrame Transformations and Performance
Which of the following operations can be used to return a new DataFrame from DataFrame storesDF without inducing a shuffle?
Spark DataFrame APIShufflePartitioningPerformance Optimization - Question #28Spark DataFrame Partitioning and Shuffling
The code block shown below contains an error. The code block is intended to return a new 12- partition DataFrame from the 8-partition DataFrame storesDF by inducing a shuffle. Iden...
Spark DataFramePartitioningShufflecoalesce vs repartition - Question #29Optimizing Spark Applications
Which of the following Spark properties is used to configure whether DataFrame partitions that do not meet a minimum size threshold are automatically coalesced into larger partitio...
Adaptive Query Execution (AQE)Spark Performance TuningShuffle OptimizationSpark Configuration - Question #30Data Manipulation with Spark SQL and DataFrames
The code block shown below contains an error. The code block is intended to return a DataFrame containing a column openDateString, a string representation of Java's SimpleDateForma...
Spark SQL FunctionsDate/Time ConversionDataFrame API - Question #31Perform Data Transformations using Spark DataFrame API
Which of the following code blocks returns a DataFrame containing a column dayOfYear, an integer representation of the day of the year from column openDate from DataFrame storesDF?...
Spark DataFrame APIDate and Time FunctionsColumn TransformationsEpoch Time Handling - Question #32Spark DataFrame Transformations
The code block shown below contains an error. The code block intended to return a new DataFrame that is the result of an inner join between DataFrame storesDF and DataFrame employe...
Spark DataFramesJoin operationsPySpark APIError identification - Question #33Performing Data Transformations with Spark DataFrames
Which of the following operations can perform an outer join on two DataFrames?
Spark DataFramesJoin OperationsOuter JoinDataFrame API - Question #34Performing Join Operations on Spark DataFrames
Which of the following pairs of arguments cannot be used in DataFrame.join() to perform an inner join on two DataFrames, named and aliased with "a" and "b" respectively, to specify...
Spark DataFramesJoin OperationsPySpark APIColumn Expressions - Question #35Performance Tuning and Optimization
The below code block contains a logical error resulting in inefficiency. The code block is intended to efficiently perform a broadcast join of DataFrame storesDF and the much large...
Spark SQLBroadcast JoinPerformance OptimizationDataFrame Operations - Question #36Spark DataFrame Transformations
The code block shown below contains an error. The code block is intended to return a new DataFrame that is the result of a cross join between DataFrame storesDF and DataFrame emplo...
Spark DataFrameJoinsCross JoinSyntax - Question #37Performing Data Transformations with Spark DataFrames
The code block shown below contains an error. The code block is intended to return a new DataFrame that is the result of a position-wise union between DataFrame storesDF and DataFr...
Spark DataFramesDataFrame operationsunionByNameData manipulation - Question #38Performing Data Output Operations
Which of the following code blocks writes DataFrame storesDF to file path filePath as JSON?
DataFrame OperationsWriting DataJSON FormatSpark I/O - Question #39Writing DataFrames to Storage
In what order should the below lines of code be run in order to write DataFrame storesDF to file path filePath as parquet and partition by values in column division? Lines of code:...
Spark DataFrame APIData WritingParquet FormatData Partitioning - Question #40Data Ingestion and Output with Apache Spark DataFrames
The code block shown below contains an error. The code block intended to read a parquet at the file path filePath into a DataFrame. Identify the error. Code block: spark.read.load(...
Spark DataFramesData IngestionReading ParquetSpark API Usage - Question #41Data Ingestion
In what order should the below lines of code be run in order to read a JSON file at the file path filePath into a DataFrame with the specified schema schema? Lines of code: 1. .jso...
Spark DataFramesReading DataJSON Data SourceSchema Specification - Question #42Data Persistence and Caching
Which of the following storage levels should be used to store as much data as possible in memory on two cluster nodes while storing any data that does not fit in memory on disk to...
Spark Storage LevelsRDD PersistenceData CachingFault Tolerance - Question #43Optimizing Spark Application Performance
Which of the following Spark properties is used to configure the maximum size of an automatically broadcasted DataFrame when performing a join?
Spark SQL ConfigurationJoin OptimizationBroadcast Join - Question #44Optimizing Spark Applications
Which of the following Spark properties is used to configure whether skewed partitions are automatically detected and subdivided into smaller partitions when joining two DataFrames...
Spark ConfigurationAdaptive Query ExecutionPerformance TuningSkewed Joins - Question #45Spark Structured APIs
Which of the following statements about the Spark DataFrame is true?
Spark DataFrameStructured APICore ConceptsData Structures - Question #46Working with Spark DataFrames and Transformations
Which of the following operations can be used to return a new DataFrame from DataFrame storesDF without columns that are specified by name?
DataFrame OperationsColumn ManipulationSpark APIData Transformation - Question #47Performing Data Transformations with Spark DataFrames
Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000?
Spark DataFramesFiltering DataColumn ExpressionsDataFrame Transformations - Question #48Performing Data Transformations with Spark DataFrames
Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 OR the value...
PySpark DataFramesDataFrame FilteringLogical OperatorsColumn Expressions - Question #49DataFrame Transformations
The code block shown below should return a new DataFrame from DataFrame storesDF where column storeId is of the type string. Choose the response that correctly fills in the numbere...
Spark DataFrame APIColumn TransformationData Type ConversionPySpark - Question #50Spark DataFrame Transformations
Which of the following code blocks returns a new DataFrame from DataFrame storesDF where column modality is the constant string "PHYSICAL"? Assume DataFrame storesDF is the only de...
Spark DataFrameswithColumnLiteral expressionsData Transformation