A company is converting a large number of unstructured paper receipts into images. The company wants to create a model based on natural language processing (NLP) to find relevant entities such as date

Sign in or unlock MLS-C01 to reveal the answer and full explanation for question #159. The question stem and answer options stay visible for context.

Data Engineering

Question

A company is converting a large number of unstructured paper receipts into images. The company wants to create a model based on natural language processing (NLP) to find relevant entities such as date, location, and notes, as well as some custom entities such as receipt numbers. The company is using optical character recognition (OCR) to extract text for data labeling. However, documents are in different structures and formats, and the company is facing challenges with setting up the manual workflows for each document type. Additionally, the company trained a named entity recognition (NER) model for custom entity detection using a small sample size. This model has a very low confidence score and will require retraining with a large dataset. Which solution for text extraction and entity detection will require the LEAST amount of effort?

Options

AExtract text from receipt images by using Amazon Textract.
BExtract text from receipt images by using a deep learning OCR model from the AWS Marketplace.
CExtract text from receipt images by using Amazon Textract.
DExtract text from receipt images by using a deep learning OCR model from the AWS Marketplace.

Unlock MLS-C01 to see the answer

You've previewed enough free MLS-C01 questions. Unlock MLS-C01 for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock MLS-C01 - $49.99 / 30 days Sign in

Topics

#Document Processing#OCR#Entity Extraction#Amazon Textract

Full MLS-C01 Practice