nerdexam
DatabricksDatabricks

CERTIFIED-DATA-ENGINEER-PROFESSIONAL · Question #118

CERTIFIED-DATA-ENGINEER-PROFESSIONAL Question #118: Real Exam Question with Answer & Explanation

The correct answer is A: Size on Disk is> 0. When using Spark's MEMORY_ONLY storage level, the ideal scenario is that the data is fully cached in memory, and the Size on Disk should be 0 (indicating that the data is not spilled to disk). If the Size on Disk is greater than 0, it suggests that some data has been spilled to d

Spark Application Optimization and Monitoring

Question

The data engineer is using Spark's MEMORY_ONLY storage level. Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?

Options

  • ASize on Disk is> 0
  • BThe number of Cached Partitions> the number of Spark Partitions
  • CThe RDD Block Name included the '' annotation signaling failure to cache
  • DOn Heap Memory Usage is within 75% of off Heap Memory usage
  • ESize on Disk is < Size in Memory

Explanation

When using Spark's MEMORY_ONLY storage level, the ideal scenario is that the data is fully cached in memory, and the Size on Disk should be 0 (indicating that the data is not spilled to disk). If the Size on Disk is greater than 0, it suggests that some data has been spilled to disk, which can lead to degraded performance as reading from disk is slower than reading from

Topics

#Spark Caching#Spark UI#Spark Storage Levels#Performance Monitoring

Community Discussion

No community discussion yet for this question.

Full CERTIFIED-DATA-ENGINEER-PROFESSIONAL PracticeBrowse All CERTIFIED-DATA-ENGINEER-PROFESSIONAL Questions