nerdexam
DatabricksDatabricks

GENERATIVE-AI-ENGINEER-ASSOCIATE · Question #91

GENERATIVE-AI-ENGINEER-ASSOCIATE Question #91: Real Exam Question with Answer & Explanation

The correct answer is B: Flatten the dataframe to one chunk per row, create a unique identifier for each row, and save to a. See the full explanation below for the reasoning.

Data Ingestion and Preparation for Vector Search

Question

A Generative AI Engineer has successfully ingested unstructured documents and chunked them by document sections. They would like to store the chunks in a Vector Search index. The current format of the dataframe has two columns: (i) original document file name (ii) an array of text chunks for each document. What is the most performant way to store this dataframe?

Options

  • ASplit the data into train and test set, create a unique identifier for each document, then save to a
  • BFlatten the dataframe to one chunk per row, create a unique identifier for each row, and save to a
  • CFirst create a unique identifier for each document, then save to a Delta table
  • DStore each chunk as an independent JSON file in Unity Catalog Volume. For each JSON file, the

Topics

#Data Preprocessing#Vector Search Indexing#RAG Architecture#Databricks Data Engineering

Community Discussion

No community discussion yet for this question.

Full GENERATIVE-AI-ENGINEER-ASSOCIATE PracticeBrowse All GENERATIVE-AI-ENGINEER-ASSOCIATE Questions