SOL-C01 · Question #172
SOL-C01 Question #172: Real Exam Question with Answer & Explanation
The correct answer is B: Ensure that the CSV files are split into smaller files (e.g., 10-100MB each). Snowflake. Snowflake automatically parallelizes data loading across multiple files. Smaller files allow Snowflake to distribute the workload across multiple nodes in the virtual warehouse, thus improving performance. Increasing the warehouse size (Option A) may help to some extent but is no
Question
A Snowflake data engineer is tasked with optimizing the performance of daily data loads into a fact table named `SALES FACT. The data is loaded from multiple CSV files stored in an AWS S3 bucket. The data engineer observes that the data loading process is slow, even though the virtual warehouse is adequately sized. Analyzing the COPY INTO command, they notice that data loading happens in single thread even though multiple files are staged. Which of the following techniques, or combination of techniques, would be MOST effective in parallelizing the data load and improving the overall loading performance?
Options
- AIncrease the size of the virtual warehouse to the largest available size (e.g., X-Large or higher).
- BEnsure that the CSV files are split into smaller files (e.g., 10-100MB each). Snowflake
- CUse Snowpipe with auto-ingest enabled to continuously load data as new files arrive in the S3
- DPartition the 'SALES FACT table by a relevant date column and then use the 'ON ERROR =
- EB and C are both correct
Explanation
Snowflake automatically parallelizes data loading across multiple files. Smaller files allow Snowflake to distribute the workload across multiple nodes in the virtual warehouse, thus improving performance. Increasing the warehouse size (Option A) may help to some extent but is not the primary driver of parallelization. Snowpipe (Option C) is a good option for continuous data loading but not directly related to parallelizing a COPY INTO command. Partitioning the table (Option D) is a good practice for query performance but not directly related to improving the loading performance of a COPY INTO command. splitting smaller file and auto scale are independent activities but splitting files into smaller chunks is important.
Topics
Community Discussion
No community discussion yet for this question.