nerdexam
DatabricksDatabricks

GENERATIVE-AI-ENGINEER-ASSOCIATE · Question #11

GENERATIVE-AI-ENGINEER-ASSOCIATE Question #11: Real Exam Question with Answer & Explanation

The correct answer is D: beautifulsoup. BeautifulSoup is a Python package specifically designed for parsing HTML and XML documents. It provides simple methods to extract and manipulate text from HTML files, making it the most suitable choice for extracting text from HTML source documents with minimal lines of

Data Ingestion and Preprocessing for Retrieval Augmented Generation (RAG)

Question

A Generative AI Engineer is building a RAG application that will rely on context retrieved from source documents that are currently in HTML format. They want to develop a solution using the least amount of lines of code. Which Python package should be used to extract the text from the source documents?

Options

  • Apytesseract
  • Bnumpy
  • Cpypdf2
  • Dbeautifulsoup

Explanation

BeautifulSoup is a Python package specifically designed for parsing HTML and XML documents. It provides simple methods to extract and manipulate text from HTML files, making it the most suitable choice for extracting text from HTML source documents with minimal lines of

Topics

#HTML Parsing#Text Extraction#Python Libraries#RAG Data Preparation

Community Discussion

No community discussion yet for this question.

Full GENERATIVE-AI-ENGINEER-ASSOCIATE PracticeBrowse All GENERATIVE-AI-ENGINEER-ASSOCIATE Questions