DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Practice Questions
181 real DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam questions with expert-verified answers and explanations. Page 3 of 4.
- Question #101Spark Execution Model
Which of the following identifies multiple narrow operations that are executed in sequence?
Spark Execution ModelSpark StagesNarrow TransformationsSpark DAG - Question #102Spark Architecture and Deployment
Spark's execution/deployment mode determines where the driver and executors are physically located when a Spark application is run. Which of the following Spark execution/deploymen...
Spark Deployment ModesSpark ArchitectureExecution Modes - Question #103Spark Application Architecture and Execution
Which of the following will cause a Spark job to fail?
Spark ArchitectureDriver NodeFault ToleranceJob Execution - Question #104Spark Core Concepts
Which of the following best describes the similarities and differences between the MEMORY_ONLY storage level and the MEMORY_AND_DISK storage level?
Spark Storage LevelsRDD PersistenceSpark CachingPerformance Optimization - Question #105Spark SQL Query Optimization
Which of the following Spark properties is used to configure whether DataFrames found to be below a certain size threshold at runtime will be automatically broadcasted?
Spark SQLBroadcast JoinPerformance TuningSpark Configuration - Question #106Spark DataFrame API
The code block shown below contains an error. The code block is intended to return a new DataFrame from DataFrame storesDF where column storeId is of the type string. Identify the...
PySpark Data TypesDataFrame TransformationsType CastingPySpark API Usage - Question #107Perform DataFrame Transformations
Which of the following code blocks returns a new DataFrame where column division is the first two characters of column division in DataFrame storesDF?
DataFrame TransformationsString FunctionsPySparkColumn Operations - Question #108Spark DataFrame API
The code block shown below should return a new DataFrame where column division from DataFrame storesDF has been renamed to column state and column managerName from DataFrame stores...
PySpark DataFramesColumn RenamingDataFrame Transformations - Question #109Manipulate and clean data using Spark DataFrames
The code block shown below should return a new DataFrame where rows in DataFrame storesDF with missing values in every column have been dropped. Choose the response that correctly...
PySpark DataFramesMissing Data HandlingData CleaningDataFrameNaFunctions - Question #110Spark DataFrame API Operations
The code block shown below should return a collection of summary statistics for column sqft in DataFrame storesDF. Choose the response that correctly fills in the numbered blanks w...
PySpark DataFrameDescriptive StatisticsSpark APIdescribe method - Question #111Performing Data Transformations using Spark DataFrames
Which of the following code blocks returns a 15 percent sample of rows from DataFrame storesDF without replacement?
Spark DataFrameSamplingPySpark APIData Transformation - Question #112Accessing and Manipulating DataFrame Data
Which of the following code blocks extracts the value for column sqft from the first row of DataFrame storesDF?
Spark DataFrameRow objectData extractionPySpark API - Question #113Data Manipulation with Spark SQL
Which of the following code blocks uses SQL to return a new DataFrame containing column storeId and column managerName from a table created from DataFrame storesDF?
Spark SQLDataFrame OperationsTemporary Views - Question #114Optimizing Spark Applications
The code block shown below should adjust the number of partitions used in wide transformations like join() to 32. Choose the response that correctly fills in the numbered blanks wi...
Spark ConfigurationShuffle PartitionsPerformance TuningWide Transformations - Question #115Data Transformation and Manipulation using Spark DataFrames
Which of the following code blocks returns a DataFrame containing a column openDateString, a string representation of Java's SimpleDateFormat? Note that column openDate is of type...
Spark DataFramesDate/Time Functionsfrom_unixtimeColumn Transformations - Question #116Performing DataFrame Transformations
Which of the following code blocks returns a new DataFrame that is the result of a cross join between DataFrame storesDF and DataFrame employeesDF?
Spark DataFrameCross JoinDataFrame TransformationsPySpark/Scala Spark API - Question #117Transforming Data with Spark DataFrames
Which of the following code blocks returns a new DataFrame that is the result of a position-wise union between DataFrame storesDF and DataFrame acquiredStoresDF?
Spark DataFrameData TransformationUnion OperationDataFrame API - Question #118Data Loading and Persistence
In what order should the below lines of code be run in order to read a parquet at the file path filePath into a DataFrame? Lines of code: 1. storesDF 2. .load(filePath, source = "p...
SparkSessionDataFrameReaderReading ParquetData Loading - Question #119Loading Data into DataFrames
The code block shown below should read a CSV at the file path filePath into a DataFrame with the specified schema schema. Choose the response that correctly fills in the numbered b...
Spark DataFrame APIData IngestionCSV Data SourceSchema Definition - Question #120Spark Architecture and Core Concepts
Which of the following describes the Spark driver?
Spark DriverSpark ArchitectureSpark Application Components - Question #121Spark Cluster Architecture and Execution Model
Which of the following describes the relationship between nodes and executors?
Spark ArchitectureExecutorsNodesCluster Components - Question #122Spark Application Execution and Optimization
Which of the following will occur if there are more slots than there are tasks?
Spark Execution ModelResource AllocationTask SchedulingSpark Job Efficiency - Question #123Spark Architecture and Execution
Which of the following is the most granular level of the Spark execution hierarchy?
Spark Execution ModelSpark ArchitectureTask (Spark) - Question #124Spark Architecture and Execution
Which of the following statements about Spark jobs is incorrect?
Spark Execution ModelSpark JobsSpark MonitoringStages and Tasks - Question #125Spark Execution and Performance Optimization
Which of the following operations is most likely to result in a shuffle?
Spark ShuffleWide TransformationsDataFrame OperationsSpark Performance - Question #126Spark Performance Tuning
The default value of spark.sql.shuffle.partitions is 200. Which of the following describes what that means?
Spark ConfigurationSpark SQL ShuffleDataFrame Partitions - Question #127Spark Core Concepts and Architecture
Which of the following is the most complete description of lazy evaluation?
Lazy EvaluationSpark ArchitectureExecution ModelTransformations and Actions - Question #128Working with Apache Spark DataFrames
Which of the following code blocks applies the function assessPerformance() to each row of DataFrame storesDF?
Spark DataFrame ActionsScala Collection MethodsLambda ExpressionsDriver Program Operations - Question #129Spark DataFrame Operations
The code block shown below contains an error. The code block is intended to print the schema of DataFrame storesDF. Identify the error. Code block: storesDF.printSchema.getAs[Strin...
Spark DataFramesSchema InspectionDataFrame APIDebugging - Question #130Implementing User-Defined Functions (UDFs) in Spark SQL
Which of the following code blocks creates and registers a SQL UDF named "ASSESS_PERFORMANCE" using the Scala function assessPerformance() and applies it to column customerSatisfac...
UDFsSpark SQLScala UDFsFunction Registration - Question #131Working with Spark SQL and DataFrames
The code block shown below should use SQL to return a new DataFrame containing column storeId and column managerName from a table created from DataFrame storesDF. Choose the respon...
Spark SQLDataFramesTemporary ViewsSQL Integration - Question #132Creating and Manipulating DataFrames/Datasets
The code block shown below contains an error. The code block intended to create a single- column DataFrame from Scala List years which is made up of integers. Identify the error. C...
Spark DatasetSpark DataFrameSchema definitionType conversion - Question #133Optimize Spark DataFrame Operations
Which of the following code blocks will always return a new 4-partition DataFrame from the 8- partition DataFrame storesDF without inducing a shuffle?
Spark DataFramesPartitionsCoalesceShuffle - Question #134Working with Spark DataFrames
The code block shown below should return a new 12-partition DataFrame from DataFrame storesDF. Choose the response that correctly fills in the numbered blanks within the code block...
Spark DataFrame APIData PartitioningSpark Transformations - Question #135Spark Application Configuration
The code block shown below contains an error. The code block is intended to adjust the number of partitions used in wide transformations like join() to 32. Identify the error. Code...
Spark ConfigurationSparkSession APIData PartitioningError Identification - Question #136Data Transformation
The code block shown below contains an error. The code block intended to return a DataFrame containing a column dayOfYear, an integer representation of the day of the year from col...
Spark SQL FunctionsDataFrame TransformationsType CastingDate/Time Operations - Question #137Spark DataFrame Programming
Which of the following DataFrame operations is classified as an action?
Spark ActionsSpark TransformationsDataFrame OperationsSpark Core Concepts - Question #138Understand Spark Core Concepts and Transformations
Which of the following DataFrame operations is classified as a wide transformation?
Spark TransformationsWide TransformationsDataFrame OperationsData Shuffling - Question #139Spark Application Execution
Which of the following describes the difference between cluster and client execution modes?
Spark ArchitectureExecution ModesDriver ProgramSpark Application Lifecycle - Question #140Spark Fault Tolerance
Which of the following statements about Spark's stability is incorrect?
Spark ArchitectureFault ToleranceDriver NodeWorker Node - Question #141Optimizing and Troubleshooting Spark Applications
Which of the following cluster configurations is most likely to experience an out-of-memory error in response to data skew in a single partition? Note: each configuration has rough...
Spark Performance TuningData SkewExecutor MemoryOut-of-Memory Errors - Question #142Spark Data Management and Optimization
Of the following situations, in which will it be most advantageous to store DataFrame df at the MEMORY_AND_DISK storage level rather than the MEMORY_ONLY storage level?
Spark CachingStorage LevelsDataFrame PersistencePerformance Optimization - Question #143Optimizing Spark Applications
A Spark application has a 128 GB DataFrame A and a 1 GB DataFrame B. If a broadcast join were to be performed on these two DataFrames, which of the following describes which DataFr...
Spark JoinsBroadcast JoinSpark PerformanceDataFrame Operations - Question #144Spark DataFrame Operations
Which of the following operations can be used to create a new DataFrame that has 12 partitions from an original DataFrame df that has 8 partitions?
Spark DataFramesData Partitioningrepartition operationDataFrame API - Question #145Understand Spark DataFrame structure and capabilities
Which of the following object types cannot be contained within a column of a Spark DataFrame?
Spark DataFrameData TypesColumn Types - Question #146Performing Basic DataFrame Transformations
Which of the following operations can be used to create a DataFrame with a subset of columns from DataFrame storesDF that are specified by name?
Spark DataFrameColumn SelectionDataFrame TransformationsPySpark - Question #147Performing DataFrame Transformations
The code block shown below contains an error. The code block is intended to return a DataFrame containing all columns from DataFrame storesDF except for column sqft and column cust...
Spark DataFrame APIColumn OperationsPySpark SyntaxDataFrame Transformations - Question #148Spark Architecture and Execution
Which of the following describes slots?
Spark ArchitectureExecution ModelResource AllocationParallelism - Question #149Spark Architecture and Execution
Which of the following lists the units of work performed by Spark from largest to smallest?
Spark Execution ModelSpark ArchitectureJobs Stages TasksSpark Fundamentals - Question #150Understand Spark's execution model, including transformations, actions, and the implications of data shuffling for performance.
Which of the following operations is least likely to result in a shuffle?
Spark DataFrame OperationsData ShufflingWide TransformationsPerformance Considerations