DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK Practice Questions
181 real DATABRICKS-CERTIFIED-ASSOCIATE-DEVELOPER-FOR-APACHE-SPARK exam questions with expert-verified answers and explanations. Page 2 of 4.
- Question #51DataFrame Transformations
The code block shown below contains an error. The code block is intended to return a new DataFrame where column managerName from DataFrame storesDF is split at the space character...
PySpark DataFrameString FunctionsColumn ExpressionsDataFrame Transformations - Question #52DataFrame Transformations
The code block shown below should return a new DataFrame where single quotes in column storeSlogan have been replaced with double quotes. Choose the response that correctly fills i...
PySpark DataFrame APIColumn TransformationsString FunctionsData Manipulation - Question #53Manipulate DataFrames using the Spark API
Which of the following code blocks returns a new DataFrame where column division from DataFrame storesDF has been replaced and renamed to column state and column managerName from D...
DataFrame OperationsColumn RenamingSpark APITransformations - Question #54Handling Missing Data in Spark DataFrames
Which of the following code blocks returns a new DataFrame where column sqft from DataFrame storesDF has had its missing values replaced with the value 30,000? A sample of DataFram...
Spark DataFrame APIMissing ValuesData Cleaningna.fill - Question #55Perform DataFrame transformations
Which of the following operations can be used to return a DataFrame with no duplicate rows? Please select the most complete answer.
DataFrame APIDuplicate DataPySparkData Cleaning - Question #56Spark DataFrame Operations
Which of the following code blocks returns a DataFrame where column divisionDistinct is the approximate number of distinct values in column division from DataFrame storesDF?
Spark DataFrame APIAggregationapprox_count_distinctPySpark Syntax - Question #57Perform data aggregation operations on Spark DataFrames
The code block shown below should return a new DataFrame with the mean of column sqft from DataFrame storesDF in column sqftMean. Choose the response that correctly fills in the nu...
Spark DataFrame APIAggregationPySparkData Transformation - Question #58Performing Aggregations on DataFrames
Which of the following code blocks returns the number of rows in DataFrame storesDF for each unique value in column division?
Spark DataFramesAggregationgroupBycount - Question #59Transforming Data with Spark DataFrames
Which of the following code blocks returns a DataFrame sorted alphabetically based on column division?
DataFrame OperationsSorting DataFramesorderBy FunctionSpark SQL API - Question #60Manipulating DataFrames in Apache Spark
Which of the following code blocks returns a 10 percent sample of rows from DataFrame storesDF with replacement?
Spark DataFrame APIData SamplingDataFrame Transformations - Question #61Performing Basic DataFrame Actions
Which of the following code blocks returns the first 3 rows of DataFrame storesDF?
PySparkDataFrame APIData RetrievalAction Operations - Question #62Spark DataFrame Operations
The code block shown below should return a new DataFrame that is the result of an inner join between DataFrame storeDF and DataFrame employeesDF on column storeId. Choose the respo...
Spark DataFramesJoinsData TransformationPySpark - Question #63Performing Data Transformations with Spark DataFrames
The code block shown below should return a new DataFrame that is the result of an outer join between DataFrame storesDF and DataFrame employeesDF on column storeId. Choose the resp...
Spark DataFramesDataFrame JoinsData TransformationApache Spark - Question #64Manipulating DataFrames (Joins, Filters, Aggregations)
Which of the following code blocks fails to return a new DataFrame that is the result of an inner join between DataFrame storesDF and DataFrame employeesDF on column storeId and co...
Spark DataFramesDataFrame JoinScala APIJoin Syntax - Question #65Spark DataFrame Operations and Optimizations
The code block shown below should efficiently perform a broadcast join of DataFrame storesDF and the much larger DataFrame employeesDF using key column storeId. Choose the response...
Spark DataFrame JoinsBroadcast JoinSpark SQL APIPerformance Tuning - Question #66Performing Joins on DataFrames
Which of the following operations performs a cross join on two DataFrames?
Spark DataFramesCross JoinDataFrame APIJoins - Question #67Working with Spark DataFrames and I/O Operations
Which of the following code blocks writes DataFrame storesDF to file path filePath as CSV?
DataFrame APIWriting DataCSV FormatData Persistence - Question #68Working with Spark DataFrames
Which of the following code blocks writes DataFrame storesDF to file path filePath as parquet and partitions by values in column division?
Spark DataFrame APIData PersistenceParquetPartitioning - Question #69Spark Execution Model
Of the following, which is the coarsest level in the Spark execution hierarchy?
Spark ArchitectureSpark Execution ModelJob Stage Task - Question #70Spark Architecture and Core Concepts
Which of the following statements about slots is incorrect?
Spark ArchitectureExecution ModelResource ManagementTasks and Slots - Question #71Spark DataFrame API Operations
Which of the following code blocks returns a new Data Frame from DataFrame storesDF with no duplicate rows?
DataFrame OperationsDuplicate RowsSpark API - Question #72Performing Aggregations and Transformations with Spark DataFrames
The code block shown below contains an error. The code block is intended to return the exact number of distinct values in column division in DataFrame storesDF. Identify the error....
Spark FunctionsAggregationapprox_count_distinctFunction Parameters - Question #73Performing Data Aggregations with Spark DataFrames
Which of the following code blocks returns the number of rows in DataFrame storesDF for each distinct combination of values in column division and column storeCategory?
Spark DataFrameGroupingAggregation - Question #74Performing Data Analysis with Spark DataFrames
The code block shown below contains an error. The code block is intended to return a collection of summary statistics for column sqft in Data Frame storesDF. Identify the error. Co...
Spark DataFramesDataFrame APIDescriptive StatisticsAPI Usage - Question #75Work with Spark DataFrames
The code block shown below should extract the integer value for column sqft from the first row of DataFrame storesDF. Choose the response that correctly fills in the numbered blank...
Spark DataFrame APIScalaRow ObjectData Extraction - Question #76Working with DataFrames
The code block shown below should print the schema of DataFrame storesDF. Choose the response that correctly fills in the numbered blanks within the code block to complete this tas...
Spark DataFrame APISchema OperationsData InspectionPySpark - Question #77Working with Spark SQL and DataFrames
The code block shown below contains an error. The code block is intended to create and register a SQL UDF named "ASSESS_PERFORMANCE" using the Scala function assessPerformance() an...
Spark UDFsSpark SQLDataFrame APIError Identification - Question #78Spark SQL and DataFrames
The code block shown below contains an error. The code block is intended to create the Scala UDF assessPerformanceUDF() and apply it to the integer column customers1t1sfaction in D...
Spark UDFsScala ProgrammingSpark DataFrame API - Question #79Working with DataFrames
The code block shown below should create a single-column DataFrame from Scala list years which is made up of integers. Choose the response that correctly fills in the numbered blan...
DataFrame CreationSparkSessionScala CollectionsData Transformation - Question #80Performance Tuning and Optimization
The code block shown below should cache DataFrame storesDF only in Spark's memory. Choose the response that correctly fil ls in the numbered blanks within the code block to complet...
Spark CachingDataFrame PersistenceStorage LevelsPerformance Tuning - Question #81Performing Transformations on DataFrames
Which of the following code blocks returns a DataFrame containing a column month, an integer representation of the day of the year from column openDate from DataFrame storesDF. Not...
Spark DataFramesDate/Time FunctionsData Type CastingColumn Operations - Question #82Performing Data Manipulations using DataFrame API
The code block shown below contains an error. The code block intended to return a new DataFrame that is the result of an inner join between DataFrame storesDF and DataFrame employe...
Spark DataFramesJoin OperationsDataFrame APIJoin Types - Question #83Implementing Join Operations in Spark DataFrames
Which of the following pairs of arguments cannot be used in DataFrame.join() to perform an inner join on two DataFrames, named and aliased with "a" and "b" respectively, to specify...
Spark DataFramesDataFrame JoinsJoin SyntaxColumn Expressions - Question #84Manipulating Spark DataFrames
The code block shown below contains an error. The code block is intended to return a new DataFrame that is the result of a position-wise union between DataFrame storesDF and DataFr...
Spark DataFrameDataFrame TransformationsUnion Operation - Question #85Data Persistence
Which of the following code blocks writes DataFrame storesDF to file path filePath as parquet overwriting any existing files in that location?
Spark DataFrame APIData PersistenceParquet FormatSave Modes - Question #86Data Ingestion and Loading
Which of the following code blocks reads a CSV at the file path filePath into a Data Frame with the specified schema schema?
Spark DataFrame APIReading DataCSV FilesSchema Definition - Question #87Perform Data Manipulation using Apache Spark DataFrames
Which of the following code blocks returns a DataFrame containing only the rows from DataFrame storesDF where the value in column sqft is less than or equal to 25,000 AND the value...
Spark DataFramesPySpark FilteringColumn ExpressionsLogical Operators - Question #88Performing DataFrame Transformations
Which of the following sets of DataFrame methods will both return a new DataFrame only containing rows that meet a specified logical condition?
Spark DataFrameDataFrame APIData FilteringTransformations - Question #89Working with Spark DataFrames
The code block shown below should return a DataFrame containing all columns from DataFrame storesDF except for column sqft and column customerSatisfaction. Choose the response that...
Spark DataFrame APIColumn OperationsPySparkData Transformation - Question #90Optimizing Spark Applications
Which of the following describes the difference between DataFrame.repartition(n) and DataFrame.coalesce(n)?
Spark DataFramesPartitioningPerformance OptimizationShuffle Operations - Question #91Spark Cluster Management and Performance
Which of the following cluster configurations is most likely to experience delays due to garbage collection of a large Dataframe? Note: each configuration has roughly the same comp...
Spark Performance TuningGarbage CollectionSpark Cluster ConfigurationMemory Management - Question #92Working with Spark DataFrames
Which of the following DataFrame operations is classified as a transformation?
Spark DataFramesTransformationsActionsSpark API - Question #93Spark DataFrame Operations and Execution
Which of the following operations will fail to trigger evaluation?
Spark Lazy EvaluationSpark ActionsSpark TransformationsDataFrame Operations - Question #94Spark Core Concepts and Architecture
Of the following, which is the coarsest level in the Spark execution hierarchy?
Spark architectureExecution hierarchyJob Stage Task - Question #95Manipulate DataFrame Columns and Apply Transformations
Which of the following code blocks returns a new DataFrame with a new column customerSatisfactionAbs that is the absolute value of column customerSatisfaction in DataFrame storesDF...
Spark DataFramesColumn TransformationswithColumnBuilt-in Functions - Question #96Spark Architecture and Execution
Which of the following statements about the Spark driver is true?
Spark DriverSpark ArchitectureExecution ModelScheduling - Question #97Spark I/O Operations
The code block shown below should write DataFrame storesDF to file path filePath as parquet and partition by values in column division. Choose the response that correctly fills in...
Spark DataFrame APIData PersistenceParquet FormatData Partitioning - Question #98Spark Execution Model
Which of the following types of processes induces a stage boundary?
Spark Execution ModelSpark DAGStage BoundariesShuffles - Question #99Spark Performance Optimization
Which of the following cluster configurations will induce the least network traffic during a shuffle operation? Note: each configuration has roughly the same compute power using 10...
Spark ShuffleCluster ConfigurationNetwork OptimizationPerformance Tuning - Question #100Spark Architecture and Components
Which of the following describes a partition?
Spark ArchitectureData PartitioningDistributed ProcessingSpark Core Concepts