CERTIFIED-DATA-ENGINEER-PROFESSIONAL · Question #118
CERTIFIED-DATA-ENGINEER-PROFESSIONAL Question #118: Real Exam Question with Answer & Explanation
The correct answer is A: Size on Disk is> 0. When using Spark's MEMORY_ONLY storage level, the ideal scenario is that the data is fully cached in memory, and the Size on Disk should be 0 (indicating that the data is not spilled to disk). If the Size on Disk is greater than 0, it suggests that some data has been spilled to d
Question
The data engineer is using Spark's MEMORY_ONLY storage level. Which indicators should the data engineer look for in the spark UI's Storage tab to signal that a cached table is not performing optimally?
Options
- ASize on Disk is> 0
- BThe number of Cached Partitions> the number of Spark Partitions
- CThe RDD Block Name included the '' annotation signaling failure to cache
- DOn Heap Memory Usage is within 75% of off Heap Memory usage
- ESize on Disk is < Size in Memory
Explanation
When using Spark's MEMORY_ONLY storage level, the ideal scenario is that the data is fully cached in memory, and the Size on Disk should be 0 (indicating that the data is not spilled to disk). If the Size on Disk is greater than 0, it suggests that some data has been spilled to disk, which can lead to degraded performance as reading from disk is slower than reading from
Topics
Community Discussion
No community discussion yet for this question.