A data ingestion task requires a one-TB JSON dataset to be written out to Parquet with a target part- file size of 512 MB. Because Parquet is being used instead of Delta Lake, built-in file-sizing features such as Auto-Optimize & Auto-Compaction cannot be used. Which strategy will yield the best performance without shuffling data?

Question

Accepted Answer

B. Set spark.sql.shuffle.partitions to 2,048 partitions (1TB*1024*1024/512), ingest the data, execute

Answer

A. Set spark.sql.files.maxPartitionBytes to 512 MB, ingest the data, execute the narrow

Answer

C. Set spark.sql.adaptive.advisoryPartitionSizeInBytes to 512 MB bytes, ingest the data, execute the

Answer

D. Ingest the data, execute the narrow transformations, repartition to 2,048 partitions (1TB*

Answer

E. Set spark.sql.shuffle.partitions to 512, ingest the data, execute the narrow transformations, and

CERTIFIED-DATA-ENGINEER-PROFESSIONAL Question #81: Real Exam Question with Answer & Explanation

Question

Options

Explanation

Topics

Community Discussion