CERTIFIED-MACHINE-LEARNING-PROFESSIONAL · Question #42
CERTIFIED-MACHINE-LEARNING-PROFESSIONAL Question #42: Real Exam Question with Answer & Explanation
The correct answer is D: spark.read.table(path).drop("star_rating"). Option D correctly chains spark.read.table(path) - the standard Spark method for reading a registered Delta table by name or path - with .drop("star_rating"), which returns a new DataFrame excluding that column. This is the idiomatic pattern when working with tables registered in
Question
A data scientist wants to remove the star_rating column from the Delta table at the location path. To do this, they need to load in data and drop the star_rating column. Which of the following code blocks accomplishes this task?
Options
- Aspark.read.format("delta").load(path).drop("star_rating")
- Bspark.read.format("delta").table(path).drop("star_rating")
- CDelta tables cannot be modified
- Dspark.read.table(path).drop("star_rating")
- Espark.sql("SELECT * EXCEPT star_rating FROM path")
Explanation
Option D correctly chains spark.read.table(path) - the standard Spark method for reading a registered Delta table by name or path - with .drop("star_rating"), which returns a new DataFrame excluding that column. This is the idiomatic pattern when working with tables registered in the metastore.
Why each distractor fails:
- A uses
spark.read.format("delta").load(path), which is designed for reading raw, unregistered files from a file-system path - not the correct approach for a registered Delta table, making it the wrong tool here. - B attempts to chain
.table()after.format("delta"), which is invalid syntax;DataFrameReaderdoes not expose a.table()method after.format()is called. - C is factually wrong - Delta tables support schema evolution and can be modified.
- E embeds
pathas a literal identifier in SQL (FROM path), so it looks for a table literally named "path" rather than using the Python variable; additionally,SELECT * EXCEPT columnwithout parentheses is not valid Spark SQL.
Memory tip: Think of it as "table for tables, load for files" - use spark.read.table() for registered/metastore tables and spark.read.format("delta").load() only when pointing at raw file-system paths.
Community Discussion
No community discussion yet for this question.