MLS-C01 · Question #159
MLS-C01 Question #159: Real Exam Question with Answer & Explanation
The correct answer is C: Extract text from receipt images by using Amazon Textract.. {"question_number": 8, "correct_answer": "C", "explanation": "The challenges are: (1) varied document structures and formats for OCR, (2) a low-confidence custom NER model needing retraining with more data. Amazon Textract (not a generic deep-learning OCR from the Marketplace) is
Question
A company is converting a large number of unstructured paper receipts into images. The company wants to create a model based on natural language processing (NLP) to find relevant entities such as date, location, and notes, as well as some custom entities such as receipt numbers. The company is using optical character recognition (OCR) to extract text for data labeling. However, documents are in different structures and formats, and the company is facing challenges with setting up the manual workflows for each document type. Additionally, the company trained a named entity recognition (NER) model for custom entity detection using a small sample size. This model has a very low confidence score and will require retraining with a large dataset. Which solution for text extraction and entity detection will require the LEAST amount of effort?
Options
- AExtract text from receipt images by using Amazon Textract.
- BExtract text from receipt images by using a deep learning OCR model from the AWS Marketplace.
- CExtract text from receipt images by using Amazon Textract.
- DExtract text from receipt images by using a deep learning OCR model from the AWS Marketplace.
Explanation
{"question_number": 8, "correct_answer": "C", "explanation": "The challenges are: (1) varied document structures and formats for OCR, (2) a low-confidence custom NER model needing retraining with more data. Amazon Textract (not a generic deep-learning OCR from the Marketplace) is the correct choice for extraction because it is purpose-built to handle diverse document layouts - including tables, forms, and unstructured text - without requiring per-document-type configuration. For entity detection, Amazon Comprehend provides built-in NER for standard entities (date, location, etc.) and supports custom entity recognition (Custom Comprehend) that can be trained with a sufficiently large labeled dataset. Option C pairs Textract for structured OCR with Amazon Comprehend for both standard and custom entities - the most seamless, managed, and scalable AWS-native solution. Options A and D either use an inferior OCR approach or a less integrated entity detection path. Option B relies on a third-party Marketplace model, adding unnecessary complexity.", "generated_by": "claude-sonnet", "llm_judge_score": 5}
Topics
Community Discussion
No community discussion yet for this question.