An online retail company wants to perform analytics on data in large Amazon S3 objects using Amazon EMR. An Apache Spark job repeatedly queries the same data to populate an analytics dashboard. The An

Sign in or unlock DAS-C01 to reveal the answer and full explanation for question #46. The question stem and answer options stay visible for context.

Processing

Question

An online retail company wants to perform analytics on data in large Amazon S3 objects using Amazon EMR. An Apache Spark job repeatedly queries the same data to populate an analytics dashboard. The Analytics team wants to minimize the time to load the data and create the dashboard. Which approaches could improve the performance? (Select TWO.)

Options

ACopy the source data into Amazon Redshift and rewrite the Apache Spark code to create
BCopy the source data from Amazon S3 into Hadoop Distributed File System (HDFS) using
CLoad the data into Spark DataFrames.
DStream the data into Amazon Kinesis and use the Kinesis Connector Library (KCL) in multiple
EUse Amazon S3 Select to retrieve the data necessary for the dashboards from the S3 objects.

Unlock DAS-C01 to see the answer

You've previewed enough free DAS-C01 questions. Unlock DAS-C01 for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock DAS-C01 - $49.99 / 30 days Sign in

Topics

#Spark Performance#S3 Select#EMR Optimization#Data Caching

Full DAS-C01 Practice