CERTIFIED-DATA-ANALYST-ASSOCIATE · Question #93
CERTIFIED-DATA-ANALYST-ASSOCIATE Question #93: Real Exam Question with Answer & Explanation
The correct answer is C: Delta Lake. Delta Lake is the open-source project specifically designed to bring reliability and governance to data lakes, transforming them into "lakehouses" - it adds ACID transactions, schema enforcement, time travel, and audit logging directly on top of cloud storage like S3 or ADLS. MLf
Question
Which open-source project helps to enable the data lakehouse by adding organization, reliability, performance, and data governance to data lake architectures?
Options
- AMLflow
- BApache Spark
- CDelta Lake
- DDatabricks SQL
Explanation
Delta Lake is the open-source project specifically designed to bring reliability and governance to data lakes, transforming them into "lakehouses" - it adds ACID transactions, schema enforcement, time travel, and audit logging directly on top of cloud storage like S3 or ADLS.
MLflow (A) is wrong because it's an experiment-tracking and ML lifecycle management tool, not a data lake organization layer. Apache Spark (B) is a distributed compute engine for processing data, but doesn't inherently add governance or reliability to a lake's storage layer. Databricks SQL (D) is a commercial query/analytics product built on Databricks' platform - it leverages Delta Lake but is not the open-source project enabling the lakehouse itself.
Memory tip: Think "Delta" like a river delta - it's where a lake gets organized and structured before reaching the sea. Delta Lake brings that structure to raw, chaotic data lake storage.
Community Discussion
No community discussion yet for this question.