Dell-EMC
E20-007 · Question #29
E20-007 Question #29: Real Exam Question with Answer & Explanation
The correct answer is B. Run MapReduce to transform the data,and find relevant key value pairs.. See the full explanation below for the reasoning.
Question
You are given 10, 000, 000 user profile pages of an online dating site in XML files, and they are stored in HDFS. You are assigned to divide the users into groups based on the content of their profiles. You have been instructed to try K-means clustering on this data. How should you proceed?
Options
- ADivide the data into sets of 1,000 user profiles,and run K-means clustering in RHadoop iteratively.
- BRun MapReduce to transform the data,and find relevant key value pairs.
- CRun a Naive Bayes classification as a pre-processing step in HDFS.
- DPartition the data by XML file size,and run K-means clustering in each partition.
Community Discussion
No community discussion yet for this question.