nerdexam
AmazonAmazon

DAS-C01 · Question #150

DAS-C01 Question #150: Real Exam Question with Answer & Explanation

The correct answer is C: Monitor the MemoryAllocatedMB metric. If the value crosses a user-defined threshold, add task. The HDFS replication error 'File could only be replicated to 0 nodes instead of 1' on an Amazon EMR cluster typically indicates insufficient HDFS disk space, which can be resolved by scaling out core nodes.

Processing

Question

A company uses an Amazon EMR cluster with 50 nodes to process operational data and make the data available for data analysts. These jobs run nightly use Apache Hive with the Apache Jez framework as a processing model and write results to Hadoop Distributed File System (HDFS) In the last few weeks, jobs are failing and are producing the following error message "File could only be replicated to 0 nodes instead of 1". A data analytics specialist checks the DataNode logs the NameNode logs and network connectivity for potential issues that could have prevented HDFS from replicating data. The data analytics specialist rules out these factors as causes for the issue. Which solution will prevent the jobs from failing'?

Options

  • AMonitor the HDFSUtilization metric. If the value crosses a user-defined threshold add task nodes
  • BMonitor the HDFSUtilization metric If the value crosses a user-defined threshold add core nodes
  • CMonitor the MemoryAllocatedMB metric. If the value crosses a user-defined threshold, add task
  • DMonitor the MemoryAllocatedMB metric. If the value crosses a user-defined threshold, add core

Explanation

The HDFS replication error 'File could only be replicated to 0 nodes instead of 1' on an Amazon EMR cluster typically indicates insufficient HDFS disk space, which can be resolved by scaling out core nodes.

Common mistakes.

  • A. Task nodes in an Amazon EMR cluster are primarily for compute and do not store HDFS data. Adding task nodes would increase processing power but would not address a storage capacity issue that prevents HDFS replication.
  • C. The MemoryAllocatedMB metric tracks memory usage, not disk space. While memory issues can cause job failures, they would manifest as out-of-memory errors rather than HDFS replication failures due to lack of nodes to store data. Task nodes do not store HDFS data.
  • D. The MemoryAllocatedMB metric tracks memory usage. Although core nodes store HDFS data, memory issues are distinct from the HDFS replication failure, which points to a lack of available storage nodes for data blocks, not a memory-related problem.

Concept tested. EMR HDFS storage and scaling core nodes

Reference. https://docs.aws.amazon.com/emr/latest/ManagementGuide/UsingEMR_HDFS_metrics.html

Topics

#EMR Troubleshooting#HDFS Replication#YARN Resource Management#Apache Tez

Community Discussion

No community discussion yet for this question.

Full DAS-C01 PracticeBrowse All DAS-C01 Questions