nerdexam
AmazonAmazon

DAS-C01 · Question #94

DAS-C01 Question #94: Real Exam Question with Answer & Explanation

The correct answer is A: Use an AWS Glue job to transform the data from JSON to Apache Parquet.. {"question_number": 6, "question_summary": "Cost-effective, low-expertise anomaly detection on 5 of 200 columns of JSON call records in S3", "correct_answer": "A", "explanation": "The optimal approach is to use an AWS Glue job to convert the JSON data to Apache Parquet while proj

Storage and Data Management

Question

A telecommunications company is looking for an anomaly-detection solution to identify fraudulent calls. The company currently uses Amazon Kinesis to stream voice call records in a JSON format from its on- premises database to Amazon S3. The existing dataset contains voice call records with 200 columns. To detect fraudulent calls, the solution would need to look at 5 of these columns only. The company is interested in a cost-effective solution using AWS that requires minimal effort and experience in anomaly-detection algorithms. Which solution meets these requirements?

Options

  • AUse an AWS Glue job to transform the data from JSON to Apache Parquet.
  • BUse Kinesis Data Firehose to detect anomalies on a data stream from Kinesis by running SQL
  • CUse an AWS Glue job to transform the data from JSON to Apache Parquet.
  • DUse Kinesis Data Analytics to detect anomalies on a data stream from Kinesis by running SQL

Explanation

{"question_number": 6, "question_summary": "Cost-effective, low-expertise anomaly detection on 5 of 200 columns of JSON call records in S3", "correct_answer": "A", "explanation": "The optimal approach is to use an AWS Glue job to convert the JSON data to Apache Parquet while projecting only the 5 relevant columns (dramatically reducing data volume and cost), then feed that columnar data into Amazon SageMaker's built-in Random Cut Forest algorithm. SageMaker's built-in RCF requires no custom algorithm development or deep ML expertise, satisfying the 'minimal effort' requirement. Kinesis Data Analytics RCF (option D) operates on live streams but the data is already at rest in S3, making a batch/SageMaker approach more appropriate and cost-effective. Converting to Parquet and selecting only 5 columns also reduces SageMaker training and inference costs significantly compared to processing the full 200-column JSON.", "generated_by": "claude-sonnet", "llm_judge_score": 4}

Topics

#AWS Glue#Data Transformation#Apache Parquet#Cost Optimization

Community Discussion

No community discussion yet for this question.

Full DAS-C01 PracticeBrowse All DAS-C01 Questions