A large company seeks to implement a near real-time solution involving hundreds of pipelines with parallel updates of many tables with extremely high volume and high velocity data. Which of the following solutions would you implement to achieve this requirement?

Question

Accepted Answer

A. Use Databricks High Concurrency clusters, which leverage optimized cloud storage connections

Answer

B. Partition ingestion tables by a small time duration to allow for many data files to be written in

Answer

C. Configure Databricks to save all data to attached SSD volumes instead of object storage,

Answer

D. Isolate Delta Lake tables in their own storage containers to avoid API limits imposed by cloud

Answer

E. Store all tables in a single database to ensure that the Databricks Catalyst Metastore can load

A large company seeks to implement a near real-time solution involving hundreds of pipelines with parallel updates of many tables with extremely high volume and high velocity data. Which of the follow

Question

Options

How the community answered

Explanation

Topics

Community Discussion