CERTIFIED-DATA-ANALYST-ASSOCIATE · Question #20
CERTIFIED-DATA-ANALYST-ASSOCIATE Question #20: Real Exam Question with Answer & Explanation
The correct answer is C: Combining two data sources into a single, comprehensive dataset. Combining two data sources into a single, comprehensive dataset (C) is the correct answer because this is a classic SQL operation - joining or unioning tables - which is exactly what Databricks SQL is designed to handle: querying, transforming, and merging structured data using S
Question
A data team has been given a series of projects by a consultant that need to be implemented in the Databricks Lakehouse Platform. Which of the following projects should be completed in Databricks SQL?
Options
- ATesting the quality of data as it is imported from a source
- BTracking usage of feature variables for machine learning projects
- CCombining two data sources into a single, comprehensive dataset
- DSegmenting customers into like groups using a clustering algorithm
- EAutomating complex notebook-based workflows with multiple tasks
Explanation
Combining two data sources into a single, comprehensive dataset (C) is the correct answer because this is a classic SQL operation - joining or unioning tables - which is exactly what Databricks SQL is designed to handle: querying, transforming, and merging structured data using SQL syntax in a governed, interactive environment.
Why the distractors are wrong:
- A (data quality testing on import) is better suited to Delta Live Tables or a data engineering pipeline with expectations/constraints, not SQL's interactive query layer.
- B (tracking feature variable usage for ML) belongs in the Feature Store, a dedicated MLflow-integrated component for managing ML features.
- D (customer clustering with an algorithm) is a machine learning task requiring Python/Spark ML libraries - best done in a Databricks Notebook with MLlib or scikit-learn.
- E (automating multi-task notebook workflows) describes Databricks Workflows (Jobs), the orchestration tool for pipeline automation.
Memory tip: Think of Databricks SQL as the "analyst's workspace" - if a business analyst could reasonably do it in a SQL editor (querying, joining, aggregating), it belongs in Databricks SQL. Anything requiring code, ML models, or workflow orchestration lives elsewhere in the platform.
Community Discussion
No community discussion yet for this question.