nerdexam
AmazonAmazon

DAS-C01 · Question #26

DAS-C01 Question #26: Real Exam Question with Answer & Explanation

The correct answer is A: Use the COPY command with the manifest file to load data into Amazon Redshift.. The most cost-efficient method to load data from Amazon S3 into an Amazon Redshift cluster involves using the highly parallel COPY command, optionally with a manifest file for precise control over source files. Additionally, employing temporary staging tables during the loading p

Processing

Question

A large financial company is running its ETL process. Part of this process is to move data from Amazon S3 into an Amazon Redshift cluster. The company wants to use the most cost-efficient method to load the dataset into Amazon Redshift. Which combination of steps would meet these requirements? (Choose two.)

Options

  • AUse the COPY command with the manifest file to load data into Amazon Redshift.
  • BUse S3DistCp to load files into Amazon Redshift.
  • CUse temporary staging tables during the loading process.
  • DUse the UNLOAD command to upload data into Amazon Redshift.
  • EUse Amazon Redshift Spectrum to query files from Amazon S3.

Explanation

The most cost-efficient method to load data from Amazon S3 into an Amazon Redshift cluster involves using the highly parallel COPY command, optionally with a manifest file for precise control over source files. Additionally, employing temporary staging tables during the loading process is a best practice that improves ETL robustness and efficiency by allowing validation and transformation before final insertion.

Common mistakes.

  • B. S3DistCp is a utility for copying data between Amazon S3 locations, often used in conjunction with Apache Hadoop/EMR, not a direct command for loading data into Amazon Redshift. It is an intermediate step, not the final loading mechanism.
  • D. The Amazon Redshift UNLOAD command is used to export data from a Redshift cluster to Amazon S3, which is the opposite of the requirement to load data into Redshift from S3.
  • E. Amazon Redshift Spectrum allows querying data directly from Amazon S3 without loading it into the Redshift cluster. While it can be cost-effective for querying S3 data, it does not fulfill the explicit requirement of 'moving data from Amazon S3 into an Amazon Redshift cluster.'

Concept tested. Amazon Redshift data loading best practices

Reference. https://docs.aws.amazon.com/redshift/latest/dg/r_COPY.html

Topics

#Redshift Data Loading#ETL Best Practices#Cost Optimization#Amazon S3

Community Discussion

No community discussion yet for this question.

Full DAS-C01 PracticeBrowse All DAS-C01 Questions