DatabricksDatabricks
GENERATIVE-AI-ENGINEER-ASSOCIATE · Question #82
GENERATIVE-AI-ENGINEER-ASSOCIATE Question #82: Real Exam Question with Answer & Explanation
The correct answer is C: pytesseract. See the full explanation below for the reasoning.
Data Ingestion and Preprocessing for Generative AI
Question
A Generative AI Engineer is building a RAG application that will rely on context retrieved from source documents that have been scanned and saved as image files in formats like .jpeg or .png. They want to develop a solution using the least amount of lines of code. Which Python package should be used to extract the text from the source documents?
Options
- Abeautifulsoup
- Bscrapy
- Cpytesseract
- Dpyquery
Topics
#OCR#Python Libraries#RAG Data Sources#Image Text Extraction
Community Discussion
No community discussion yet for this question.