nerdexam
Cloudera

CCD-410 · Question #28

CCD-410 Question #28: Real Exam Question with Answer & Explanation

The correct answer is B. Avro. Using Hadoop Sequence Files So what should we do in order to deal with huge amount of images? Use hadoop sequence files! Those are map files that inherently can be read by map reduce applications there is an input format especially for sequence files and are splitable by map redu

Question

You want to perform analysis on a large collection of images. You want to store this data in HDFS and process it with MapReduce but you also want to give your data analysts and data scientists the ability to process the data directly from HDFS with an interpreted high-level programming language like Python. Which format should you use to store this data in HDFS?

Options

  • ASequenceFiles
  • BAvro
  • CJSON
  • DHTML
  • EXML
  • FCSV

Explanation

Using Hadoop Sequence Files So what should we do in order to deal with huge amount of images? Use hadoop sequence files! Those are map files that inherently can be read by map reduce applications there is an input format especially for sequence files and are splitable by map reduce, so we can have one huge file that will be the input of many map tasks. By using those sequence files we are letting hadoop use its advantages. It can split the work into chunks so the processing is parallel, but the chunks are big enough that the process stays efficient. Since the sequence file are map file the desired format will be that the key will be text and hold the HDFS filename and the value will be BytesWritable and will contain the image content of the

Community Discussion

No community discussion yet for this question.

Full CCD-410 Practice