DATABRICKS-CERTIFIED-DATA-ENGINEER-ASSOCIATE Exam Questions
83 real DATABRICKS-CERTIFIED-DATA-ENGINEER-ASSOCIATE exam questions with expert-verified answers and explanations. Page 1 of 2.
- Question #1ELT with Spark SQL and Python
A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database. They run the following command: Which of the following lines of...
Spark SQLJDBC ConnectorExternal Data SourcesData Ingestion - Question #2ELT with Spark SQL and Python
A data engineering team has two tables. The first table march_transactions is a collection of all retail transactions in the month of March. The second table april_transactions is...
SQLData TransformationUNION OperatorCREATE TABLE AS SELECT (CTAS) - Question #3ELT with Spark SQL and Python
A data engineer only wants to execute the final block of a Python program if the Python variable day_of_week is equal to 1 and the Python variable review_period is True. Which of t...
Python syntaxControl flowConditional statementsLogical operators - Question #4Data Management
A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table metadata and data. They run the following command: DROP TABLE IF EXIST...
Spark SQLExternal TablesManaged TablesDROP TABLE - Question #5ELT with Spark SQL and Python
A data engineer wants to create a data entity from a couple of tables. The data entity must be used by other data engineers in other sessions. It also must be saved to a physical l...
TablesData PersistenceData EntitiesSQL Operations - Question #6Monitoring and Logging
A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer...
Data quality monitoringAutomated data pipelinesDelta Live TablesData ingestion - Question #7Deployment and Operations
A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE. The table is con...
Delta Live Tables (DLT)Continuous Pipeline ModePipeline ExecutionStreaming Data - Question #8Deployment and Operations
In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the f...
Structured StreamingFault ToleranceCheckpointingWrite-ahead Logs - Question #9Databricks Lakehouse Platform
Which of the following describes the relationship between Gold tables and Silver tables?
Medallion ArchitectureGold tablesSilver tablesData processing layers - Question #10Databricks Lakehouse Platform
Which of the following describes the relationship between Bronze tables and raw data?
Bronze layerMedallion architectureRaw data ingestionSchema application - Question #11ELT with Spark SQL and Python
Which of the following tools is used by Auto Loader process data incrementally?
Auto LoaderSpark Structured StreamingIncremental data processingData ingestion - Question #12ELT with Spark SQL and Python
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table. The cade block used by the...
Structured StreamingTriggersMicro-batchingSpark Streaming Configuration - Question #13ELT with Spark SQL and Python
A dataset has been defined using Delta Live Tables and includes an expectations clause: CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION DROP ROW What is t...
Delta Live TablesData QualityExpectationsConstraint Violations - Question #14ELT with Spark SQL and Python
Which of the following describes when to use the CREATE STREAMING LIVE TABLE (formerly CREATE INCREMENTAL LIVE TABLE) syntax over the CREATE LIVE TABLE syntax when creating Delta L...
Delta Live Tables (DLT)Streaming dataIncremental processingSQL syntax - Question #15ELT with Spark SQL and Python
A data engineer is designing a data pipeline. The source system generates files in a shared directory that is also used by other processes. As a result, the files should be kept as...
Auto LoaderIncremental data ingestionStructured StreamingFile processing - Question #17Monitoring and Logging
A data engineer has three tables in a Delta Live Tables (DLT) pipeline. They have configured the pipeline to drop invalid records at each table. They notice that some data is being...
Delta Live TablesData QualityMonitoringPipeline Observability - Question #18Deployment and Operations
A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new note...
Databricks JobsTask DependenciesJob Orchestration - Question #19Deployment and Operations
A data engineer is running code in a Databricks Repo that is cloned from a central Git repository. A colleague of the data engineer informs them that changes have been made and syn...
Git OperationsDatabricks ReposVersion ControlRemote Synchronization - Question #20Databricks Lakehouse Platform
Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?
Open SourceVendor Lock-inLakehouse PlatformDatabricks - Question #21Data Management
A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions. In which of the following locations can the da...
Permissions ManagementData ExplorerTable AccessUnity Catalog - Question #22Databricks Lakehouse Platform
Which of the following describes a scenario in which a data engineer will want to use a single- node cluster?
Databricks ClustersSingle-node ComputeWorkload ManagementInteractive Workloads - Question #23ELT with Spark SQL and Python
A data engineer has been given a new record of data: id STRING = 'a1' rank INTEGER = 6 rating FLOAT = 9.4 Which of the following SQL commands can be used to append the new record t...
SQL DMLINSERT statementDelta LakeData Appending - Question #24Data Management
A data engineer has realized that the data files associated with a Delta table are incredibly small. They want to compact the small files to form larger files to improve performanc...
Delta LakePerformance OptimizationFile CompactionOPTIMIZE command - Question #25Databricks Lakehouse Platform
In which of the following file formats is data from Delta Lake tables primarily stored?
Delta LakeFile FormatsParquetData Storage - Question #26Databricks Lakehouse Platform
Which of the following is stored in the Databricks customer's cloud account?
Databricks ArchitectureCloud ArchitectureData PlaneCloud Storage - Question #27Databricks Lakehouse Platform
Which of the following can be used to simplify and unify siloed data architectures that are specialized for specific use cases?
data lakehouseunified data architecturedata silosdata platform - Question #29ELT with Spark SQL and Python
A data engineer has a Python notebook in Databricks, but they need to use SQL to accomplish a specific task within a cell. They still want all of the other cells to use Python with...
Databricks NotebooksMagic CommandsMixed Language NotebooksSQL in Python - Question #30ELT with Spark SQL and Python
Which of the following SQL keywords can be used to convert a table from a long format to a wide format?
SQLPIVOTData TransformationLong to Wide Format - Question #31Data Management
Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?
ParquetCSVExternal TablesSchema Definition - Question #32ELT with Spark SQL and Python
A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by other data engineers in other sessions. In or...
Spark SQLTemporary ViewsLogical Data ObjectsSession Scope - Question #33ELT with Spark SQL and Python
A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the...
PySparkSpark SQLDataFrame APIQuery execution - Question #34ELT with Spark SQL and Python
Which of the following commands will return the number of null values in the member_id column?
Spark SQLNULL valuesAggregate FunctionsData Querying - Question #35ELT with Spark SQL and Python
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table. The code block used by the...
Structured StreamingStreaming TriggersavailableNow TriggerBatch Processing - Question #36ELT with Spark SQL and Python
A data engineer has developed a data pipeline to ingest data from a JSON source using Auto Loader, but the engineer has not provided any type inference or schema hints in their pip...
Auto LoaderSchema InferenceJSON DataData Ingestion - Question #37ELT with Spark SQL and Python
A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE. The table is con...
Delta Live TablesContinuous Pipeline ModeStreaming Data ProcessingELT Pipelines - Question #38Databricks Lakehouse Platform
Which of the following data workloads will utilize a Gold table as its source?
Medallion ArchitectureGold tablesLakehouse architectureData pipeline stages - Question #39ELT with Spark SQL and Python
Which of the following must be specified when creating a new Delta Live Tables pipeline?
Delta Live TablesPipeline CreationCloud StorageData Residency - Question #40ELT with Spark SQL and Python
A data engineer has joined an existing project and they see the following query in the project repository: CREATE STREAMING LIVE TABLE loyal_customers AS SELECT customer_id - FROM...
Delta Live Tables (DLT)Streaming ETLSQLStreaming Sources - Question #41ELT with Spark SQL and Python
Which of the following describes the type of workloads that are always compatible with Auto Loader?
Auto LoaderStreaming data ingestionStructured Streaming - Question #42ELT with Spark SQL and Python
A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the d...
Delta Live Tables (DLT)Medallion ArchitectureStreaming DataPython and SQL - Question #43Databricks Lakehouse Platform
A data engineer is using the following code block as part of a batch ingestion pipeline to read from a composable table: Which of the following changes needs to be made so this cod...
Structured StreamingSpark APIBatch to Stream ConversionData Ingestion - Question #45ELT with Spark SQL and Python
A dataset has been defined using Delta Live Tables and includes an expectations clause: CONSTRAINT valid_timestamp EXPECT (timestamp > '2020-01-01') ON VIOLATION FAIL UPDATE What i...
Delta Live TablesData QualityExpectationsFAIL UPDATE - Question #46Databricks Lakehouse Platform
Which of the following statements regarding the relationship between Silver tables and Bronze tables is always true?
Lakehouse ArchitectureMedallion ArchitectureBronze TableSilver Table - Question #47Databricks Lakehouse Platform
A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants...
Databricks SQL EndpointServerless ComputePerformance OptimizationCold Start - Question #48Deployment and Operations
A data engineer has a Job that has a complex run schedule, and they want to transfer that schedule to other Jobs. Rather than manually selecting each value in the scheduling form i...
Databricks JobsJob SchedulingCronProgrammatic Scheduling - Question #49Monitoring and Logging
Which of the following approaches should be used to send the Databricks Job owner an email in the case that the Job fails?
Databricks JobsJob MonitoringEmail NotificationsFailure Alerts - Question #50Monitoring and Logging
An engineering manager uses a Databricks SQL query to monitor ingestion latency for each data source. The manager checks the results of the query every day, but they are manually r...
Databricks SQLQuery SchedulingData MonitoringAutomation - Question #51Deployment and Operations
In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?
Databricks JobsTask DependenciesJob Orchestration - Question #52Monitoring and Logging
A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to a data analytics dashboard for a retail use case. The job has a Databricks...
Databricks SQL AlertsWebhooksNotificationsData Quality Monitoring - Question #53Deployment and Operations
A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when it is necessary. The dashb...
SQL Endpoint ManagementCost OptimizationScheduled WorkloadsAuto Stop Feature