Browse Exams Pricing

ExamsCERTIFIED-DATA-ENGINEER-PROFESSIONALQuestions

CERTIFIED-DATA-ENGINEER-PROFESSIONAL Exam Questions

122 real CERTIFIED-DATA-ENGINEER-PROFESSIONAL exam questions with expert-verified answers and explanations. Page 1 of 3.

Question #1Databricks Job Orchestration and Parameterization
An upstream system has been configured to pass the date for a given batch of data to the Databricks Jobs API as a parameter. The notebook to be scheduled will use this parameter to...
Databricks JobsNotebook Parameterizationdbutils WidgetsData Ingestion
Question #2Administering Databricks Workspaces
The Databricks workspace administrator has configured interactive clusters for each of the data engineering groups. To control costs, clusters are set to terminate after 30 minutes...
Databricks PermissionsCluster AccessAccess ControlWorkspace Security
Question #3Deploying and Operating Data Pipelines
When scheduling Structured Streaming jobs for production, which configuration automatically recovers from query failures and keeps costs low?
Structured StreamingCluster ManagementCost OptimizationFault Tolerance
Question #4Data Monitoring and Alerting
The data engineering team has configured a Databricks SQL query and alert to monitor the values in a Delta Lake table. The recent_sensor_recordings table contains an identifying se...
Databricks SQLMonitoring & AlertingSQL AggregationData Interpretation
Question #5Databricks Repos and Version Control
A junior developer complains that the code in their notebook isn't producing the correct results in the development environment. A shared screenshot reveals that while they're usin...
Databricks ReposGit BranchingVersion ControlRemote Repositories
Question #6Databricks Security and Access Control
The security team is exploring whether or not the Databricks secrets module can be leveraged for connecting to an external database. After testing the code with all Python variable...
Databricks SecretsSecuritydbutilsCredential Management
Question #7Delta Lake Data Management
The data science team has created and logged a production model using MLflow. The following code correctly imports and applies the production model to output the predictions as a n...
Delta LakeSpark DataFrame APIData IngestionData Persistence
Question #8Data Ingestion and Processing
An upstream source writes Parquet data as hourly batches to directories named with the current date. A nightly batch job runs the following code to ingest all data from the previou...
Data IngestionDeduplicationDelta LakeData Quality
Question #9Databricks Notebook Language Interoperability
A junior member of the data engineering team is exploring the language interoperability of Databricks notebooks. The intended outcome of the below code is to register a view of all...
Databricks NotebooksLanguage InteroperabilitySQL ViewsPython Variables
Question #10Delta Lake Performance Optimization
A Delta table of weather records is partitioned by date and has the below schema: date DATE, device_id INT, temp FLOAT, latitude FLOAT, longitude FLOAT To find all the records from...
Delta LakeData SkippingQuery OptimizationMetadata Management
Question #11Delta Lake Data Retention and Governance
The data engineering team has configured a job to process customer requests to be forgotten (have their data deleted). All user data that needs to be deleted is stored in Delta Lak...
Delta LakeData RetentionVACUUM CommandTime Travel
Question #12Orchestrating Production Workloads
A junior data engineer has configured a workload that posts the following JSON to the Databricks REST API endpoint 2.0/jobs/create. Assuming that all configurations and referenced...
Databricks JobsREST APIJob CreationWorkload Automation
Question #13Data Ingestion
An upstream system is emitting change data capture (CDC) logs that are being written to a cloud object storage directory. Each record in the log indicates the change type (insert,...
Change Data Capture (CDC)Delta LakeMedallion ArchitectureData Ingestion
Question #14Data Transformation
An hourly batch job is configured to ingest data files from a cloud object storage container where each batch represent all records produced by the source system in a given hour. T...
Delta LakeSCD Type 1MERGE INTOData Transformation
Question #15Building and Managing Production Data Pipelines
A table in the Lakehouse named customer_churn_params is used in churn prediction by the machine learning team. The table contains information about customers derived from a number...
Delta Lake MERGEData SynchronizationChange Data CaptureData Engineering Patterns
Question #16Data Storage and Management
A table is registered with the following code: Both users and orders are Delta Lake tables. Which statement describes the results of querying recent_orders?
Materialized ViewsDelta LakeData PersistenceQuery Execution
Question #17Workload Management and Optimization
A production workload incrementally applies updates from an external Change Data Capture feed to a Delta Lake table as an always-on Structured Stream job. When data was initially m...
Delta LakeAuto OptimizeMERGE OperationsFile Compaction
Question #18Data Streaming
Which statement regarding stream-static joins and static Delta tables is correct?
Structured StreamingDelta LakeStream-Static JoinMicrobatch Processing
Question #19Streaming Data Processing
A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and a...
Spark Structured StreamingWindow FunctionsData AggregationDataFrame API
Question #20Streaming Data Processing
A data architect has designed a system in which two Structured Streaming jobs will concurrently write to a single bronze Delta table. Each job is subscribing to a different topic f...
Structured StreamingCheckpointingDelta LakeConcurrency
Question #21Streaming Data Processing
A Structured Streaming job deployed to production has been experiencing delays during peak hours of the day. At present, during normal execution, each microbatch of data is process...
Structured StreamingPerformance OptimizationTrigger IntervalMicrobatch Processing
Question #22Delta Lake Features and Performance Optimization
Which statement describes Delta Lake Auto Compaction?
Delta LakeAuto CompactionData OptimizationFile Management
Question #23Real-time Data Processing with Spark Structured Streaming
Which statement characterizes the general programming model used by Spark Structured Streaming?
Spark Structured StreamingProgramming ModelStream Processing ConceptsUnbounded Tables
Question #24Optimizing Spark Workloads
Which configuration parameter directly affects the size of a spark-partition upon ingestion of data into Spark?
Spark PartitioningSpark ConfigurationData Ingestion
Question #25Optimizing Spark Applications
A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and m...
Spark PerformanceData SkewSpark UITroubleshooting
Question #26Spark Performance Optimization
Each configuration below is identical to the extent that each cluster has 400 GB total of RAM, 160 total cores and only one Executor per VM. Given a job with at least one wide tran...
Spark PerformanceCluster ConfigurationData ShufflingExecutor Sizing
Question #27Data Manipulation with Delta Lake
A junior data engineer on your team has implemented the following code block. The view new_events contains a batch of records with the same schema as the events Delta table. The ev...
Delta LakeMERGE StatementDuplicate HandlingData Ingestion
Question #28Ingesting and Transforming Data
A junior data engineer seeks to leverage Delta Lake's Change Data Feed functionality to create a Type 1 table representing all of the values that have ever been valid for all rows...
Delta LakeChange Data Feed (CDF)Data VersioningData Ingestion
Question #29Designing and Implementing Data Pipelines
A new data engineer notices that a critical field was omitted from an application that writes its Kafka source to Delta Lake. This happened even though the critical field was in th...
Delta LakeData ReliabilityBronze LayerStreaming Data
Question #31Ingesting and Transforming Data
A junior data engineer is working to implement logic for a Lakehouse table named silver_device_recordings. The source data contains 100 unique fields in a highly nested JSON struct...
Schema InferenceDelta LakeJSON DataData Ingestion
Question #32Data Transformation and Loading
The data engineering team maintains the following code: Assuming that this code produces logically correct results and the data in the source tables has been de-duplicated and vali...
Batch ProcessingDelta LakeTable OverwriteData Write Modes
Question #33Data Governance and Security
The data engineering team is migrating an enterprise system with thousands of tables and views into the Lakehouse. They plan to implement the target architecture using a series of...
Data SecurityLakehouse ArchitectureData GovernanceAccess Control
Question #34Data Storage and Management
The data architect has mandated that all tables in the Lakehouse should be configured as external Delta Lake tables. Which approach will ensure that this requirement is met?
Delta LakeExternal TablesTable ManagementSQL DDL
Question #35Data Management and Schema Evolution
To reduce storage and compute costs, the data engineering team has been tasked with curating a series of aggregate tables leveraged by business intelligence dashboards, customer-fa...
Schema EvolutionData ManagementData LakehouseChange Management
Question #36Optimizing Data Lake Performance
A Delta Lake table representing metadata about content posts from users has the following schema: user_id LONG, post_text STRING, post_id STRING, longitude FLOAT, latitude FLOAT, p...
Delta LakeQuery OptimizationFile SkippingColumn Statistics
Question #37Databricks Infrastructure Planning
A small company based in the United States has recently contracted a consulting firm in India to implement several new data engineering pipelines to power artificial intelligence a...
Cloud Cost ManagementData LocalityDatabricks Workspace DeploymentPerformance Optimization
Question #38Data Management and Quality
The downstream consumers of a Delta Lake table have been complaining about data quality issues impacting performance in their applications. Specifically, they have complained that...
Delta LakeCHECK ConstraintsData QualityTable Alteration
Question #39Optimizing Data Lakehouse Performance
Which of the following is true of Delta Lake and the Lakehouse?
Delta LakeData SkippingPerformance OptimizationTable Statistics
Question #40Data Modeling and Data Warehousing Concepts
The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table. The following logic is used to process these records....
Slowly Changing DimensionsData WarehousingData ModelingETL/ELT Concepts
Question #41Manage Security and Access Control
The DevOps team has configured a production workload as a collection of notebooks scheduled to run daily using the Jobs UI. A new data engineering hire is onboarding to the team an...
Databricks PermissionsNotebook Access ControlWorkspace SecurityProduction Best Practices
Question #42Data Governance and Security
A table named user_ltv is being used to create a view that will be used by data analysts on various teams. Users in the workspace are configured into groups, which are used for set...
Dynamic Data MaskingColumn-Level SecurityData Access ControlViews
Question #43Data Governance and Metadata Management
The data governance team has instituted a requirement that all tables containing Personal Identifiable Information (PH) must be clearly annotated. This includes adding column comme...
SQL CommandsMetadata ManagementData GovernanceDatabricks SQL
Question #44Delta Lake Data Governance and Management
The data governance team is reviewing code used for deleting records for compliance with GDPR. They note the following logic is used to delete records from the Delta Lake table nam...
Delta LakeData DeletionTime TravelData Retention
Question #45Data Storage and Management on Databricks
An external object storage container has been mounted to the location /mnt/finance_eda_bucket. The following logic was executed to create a database for the finance team: After the...
Managed TablesDatabase LocationDatabricks StorageExternal Mounts
Question #46Security and Governance
Although the Databricks Utilities Secrets module provides tools to store sensitive credentials and avoid accidentally displaying them in plain text users should still be careful wi...
Databricks SecretsSecurityAPI AccessCredential Management
Question #47Managing Databricks Workflows and Jobs
What statement is true regarding the retention of job run history?
Databricks JobsJob ManagementData RetentionPlatform Configuration
Question #48Security and Governance
A data engineer, User A, has promoted a new pipeline to production by using the REST API to programmatically create several jobs. A DevOps engineer, User B, has configured an exter...
Audit LogsDatabricks REST APIPersonal Access TokensJob Management
Question #49Optimizing Spark Applications
A user new to Databricks is trying to troubleshoot long execution times for some pipeline logic they are working on. Presently, the user is executing code cell-by-cell, using displ...
Spark Lazy EvaluationDatabricks NotebooksPerformance TuningSpark Actions
Question #50Performance Optimization and Monitoring
A production cluster has 3 executor nodes and uses the same virtual machine type for the driver and executor. When evaluating the Ganglia Metrics for this cluster, which indicator...
Spark Performance TuningCluster MonitoringDriver BottleneckDatabricks Ganglia Metrics
Question #51Monitoring and Optimization
Where in the Spark UI can one diagnose a performance problem induced by not leveraging predicate push-down?
Spark UIQuery OptimizationPredicate PushdownPerformance Tuning

Page 1 of 3Next

study smarter, certify faster.

Product

Browse Exams
Pricing
PDF Downloads

Resources

Blog
Glossary
Topics
FAQ
Contact Us

Legal

Terms of Service
Privacy Policy
Cookie Policy
DMCA
Refund Policy

Company

About
Contact

© 2026 NerdExam. All rights reserved.Built for IT professionals

NerdExam is a trading name of WADL Solutions Limited, a company incorporated in Hong Kong (CR# 80143234). Registered office: Unit 2904-05, 29/F, Universal Trade Centre, 3 Arbuthnot Road, Central, Hong Kong.

CompTIA, AWS, Cisco, Microsoft, Google Cloud, Oracle, VMware, and other certification names referenced on this site are trademarks of their respective owners. NerdExam is not affiliated with, endorsed by, or sponsored by any certification vendor.