A Generative AI Engineer has written scalable PySpark code to ingest unstructured PDF documents and chunk them in preparation for storing in a Databricks Vector Search index. Currently, the two column

Sign in or unlock GENERATIVE-AI-ENGINEER-ASSOCIATE to reveal the answer and full explanation for question #5. The question stem and answer options stay visible for context.

Data Preparation for RAG and Vector Search

Question

A Generative AI Engineer has written scalable PySpark code to ingest unstructured PDF documents and chunk them in preparation for storing in a Databricks Vector Search index. Currently, the two columns of their dataframe include the original filename as a string and an array of text chunks from that document. What set of steps should the Generative AI Engineer perform to store the chunks in a ready-to- ingest manner for Databricks Vector Search?

Options

AUse PySpark's autoloader to apply a UDF across all chunks, formatting them in a JSON structure
BFlatten the dataframe to one chunk per row, create a unique identifier for each row, and enable
CUtilize the original filename as the unique identifier and save the dataframe as is.
DCreate a unique identifier for each document, flatten the dataframe to one chunk per row and

Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE to see the answer

You've previewed enough free GENERATIVE-AI-ENGINEER-ASSOCIATE questions. Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE - $49.99 / 30 days Sign in

Topics

#PySpark Data Transformation#Vector Search Indexing#RAG Data Preparation#Data Modeling

Full GENERATIVE-AI-ENGINEER-ASSOCIATE Practice