A Generative AI Engineer has successfully ingested unstructured documents and chunked them by document sections. They would like to store the chunks in a Vector Search index. The current format of the

Sign in or unlock GENERATIVE-AI-ENGINEER-ASSOCIATE to reveal the answer and full explanation for question #91. The question stem and answer options stay visible for context.

Data Ingestion and Preparation for Vector Search

Question

A Generative AI Engineer has successfully ingested unstructured documents and chunked them by document sections. They would like to store the chunks in a Vector Search index. The current format of the dataframe has two columns: (i) original document file name (ii) an array of text chunks for each document. What is the most performant way to store this dataframe?

Options

ASplit the data into train and test set, create a unique identifier for each document, then save to a
BFlatten the dataframe to one chunk per row, create a unique identifier for each row, and save to a
CFirst create a unique identifier for each document, then save to a Delta table
DStore each chunk as an independent JSON file in Unity Catalog Volume. For each JSON file, the

Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE to see the answer

You've previewed enough free GENERATIVE-AI-ENGINEER-ASSOCIATE questions. Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock GENERATIVE-AI-ENGINEER-ASSOCIATE - $49.99 / 30 days Sign in

Topics

#Data Preprocessing#Vector Search Indexing#RAG Architecture#Databricks Data Engineering

Full GENERATIVE-AI-ENGINEER-ASSOCIATE Practice