A Data Scientist needs to migrate an existing on-premises ETL process to the cloud. The current process runs at regular time intervals and uses PySpark to combine and format multiple large data source

Sign in or unlock MLS-C01 to reveal the answer and full explanation for question #33. The question stem and answer options stay visible for context.

Data Engineering

Question

A Data Scientist needs to migrate an existing on-premises ETL process to the cloud. The current process runs at regular time intervals and uses PySpark to combine and format multiple large data sources into a single consolidated output for downstream processing. The Data Scientist has been given the following requirements to the cloud solution:

Combine multiple data sources.
Reuse existing PySpark logic.
Run the solution on the existing schedule.
Minimize the number of servers that will need to be managed.

Which architecture should the Data Scientist use to build this solution?

Options

AWrite the raw data to Amazon S3. Schedule an AWS Lambda function to submit a Spark step to a
BWrite the raw data to Amazon S3. Create an AWS Glue ETL job to perform the ETL processing
CWrite the raw data to Amazon S3. Schedule an AWS Lambda function to run on the existing
DUse Amazon Kinesis Data Analytics to stream the input data and perform real-time SQL queries

Unlock MLS-C01 to see the answer

You've previewed enough free MLS-C01 questions. Unlock MLS-C01 for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock MLS-C01 - $49.99 / 30 days Sign in

Topics

#AWS Glue#ETL#PySpark#Serverless Data Processing

Full MLS-C01 Practice