nerdexam
DatabricksDatabricks

CERTIFIED-DATA-ENGINEER-PROFESSIONAL · Question #76

CERTIFIED-DATA-ENGINEER-PROFESSIONAL Question #76: Real Exam Question with Answer & Explanation

Sign in or unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to reveal the answer and full explanation for question #76. The question stem and answer options stay visible for context.

Query Performance Optimization

Question

The data science team has requested assistance in accelerating queries on free form text from user reviews. The data is currently stored in Parquet with the below schema: item_id INT, user_id INT, review_id INT, rating FLOAT, review STRING The review column contains the full text of the review left by the user. Specifically, the data science team is looking to identify if any of 30 key words exist in this field. A junior data engineer suggests converting this data to Delta Lake will improve query performance. Which response to the junior data engineer s suggestion is correct?

Options

  • ADelta Lake statistics are not optimized for free text fields with high cardinality.
  • BText data cannot be stored with Delta Lake.
  • CZORDER ON review will need to be run to see performance gains.
  • DThe Delta log creates a term matrix for free text fields to support selective filtering.
  • EDelta Lake statistics are only collected on the first 4 columns in a table.

Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to see the answer

You've previewed enough free CERTIFIED-DATA-ENGINEER-PROFESSIONAL questions. Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Topics

#Delta Lake#Query Optimization#Data Skipping#Text Data
Full CERTIFIED-DATA-ENGINEER-PROFESSIONAL PracticeBrowse All CERTIFIED-DATA-ENGINEER-PROFESSIONAL Questions