The following code has been migrated to a Databricks notebook from a legacy workload: The code executes successfully and provides the logically correct results, however, it takes over 20 minutes to extract and load around 1 GB of data. Which statement is a possible explanation for this behavior?

Question

Accepted Answer

E. %sh executes shell code on the driver node. The code does not take advantage of the worker

Answer

A. %sh triggers a cluster restart to collect and install Git. Most of the latency is related to cluster

Answer

B. Instead of cloning, the code should use %sh pip install so that the Python code can get executed

Answer

C. %sh does not distribute file moving operations; the final line of code should be updated to

Answer

D. Python will always execute slower than Scala on Databricks. The run.py script should be

CERTIFIED-DATA-ENGINEER-PROFESSIONAL Question #70: Real Exam Question with Answer & Explanation

Question

Options

Explanation

Topics

Community Discussion