Cloudera
DS-200 · Question #28
DS-200 Question #28: Real Exam Question with Answer & Explanation
Sign in or unlock DS-200 to reveal the answer and full explanation for question #28. The question stem and answer options stay visible for context.
Question
You need to analyze 60,000,000 images stored in JPEG format, each of which is approximately 25 KB. Because your Hadoop cluster isn't optimized for storing and processing many small files you decide to do the following actions: 1. Group the individual images into a set of larger files 2. Use the set of larger files as input for a MapReduce job that processes them directly with Python using Hadoop streaming Which data serialization system gives you the flexibility to do this?
Options
- ACSV
- BXML
- CHTML
- DAvro
- ESequence Files
- FJSON
Unlock DS-200 to see the answer
You've previewed enough free DS-200 questions. Unlock DS-200 for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.