An ML engineer needs to merge and transform data from two sources to retrain an existing ML model. One data source consists of .csv files that are stored in an Amazon S3 bucket. Each .csv file consists of millions of records. The other data source is an Amazon Aurora DB cluster. The result of the merge process must be written to a second S3 bucket. The ML engineer needs to perform this merge-and-transform task every week. Which solution will meet these requirements with the LEAST operational overhead?

Question

Accepted Answer

B. Create a weekly AWS Glue job that uses the Apache Spark engine. Use DynamicFrame native

Answer

A. Create a transient Amazon EMR cluster every week. Use the cluster to run an Apache Spark job

Answer

C. Create an AWS Lambda function that runs Apache Spark code every week to merge and

Answer

D. Create an AWS Batch job that runs Apache Spark code on Amazon EC2 instances every week.

An ML engineer needs to merge and transform data from two sources to retrain an existing ML model. One data source consists of .csv files that are stored in an Amazon S3 bucket. Each .csv file consist

Question

Options

How the community answered

Explanation

Topics

Community Discussion