A table in the Lakehouse named customer_churn_params is used in churn prediction by the machine learning team. The table contains information about customers derived from a number of upstream sources. Currently, the data engineering team populates this table nightly by overwriting the table with the current valid values derived from upstream data sources. The churn prediction model used by the ML team is fairly stable in production. The team is only interested in making predictions on records that have changed in the past 24 hours. Which approach would simplify the identification of these changed records?

Question

Accepted Answer

E. Replace the current overwrite logic with a merge statement to modify only those records that have

Answer

A. Apply the churn model to all rows in the customer_churn_params table, but implement logic to

Answer

B. Convert the batch job to a Structured Streaming job using the complete output mode; configure a

Answer

C. Calculate the difference between the previous model predictions and the current

Answer

D. Modify the overwrite logic to include a field populated by calling

CERTIFIED-DATA-ENGINEER-PROFESSIONAL Question #15: Real Exam Question with Answer & Explanation

Question

Options

Explanation

Topics

Community Discussion