SOL-C01 · Question #194
SOL-C01 Question #194: Real Exam Question with Answer & Explanation
The correct answer is C: Create a custom UDF (User-Defined Function) that calls 'PARSE_DOCUMENT and then uses. Option C provides the most robust and flexible approach. Given the varying formats and quality of the invoices, a pre-defined JSON schema (option B) is unlikely to work effectively. Loading raw JSON into a VARIANT column (option A) requires extensive post-processing. Option D, wh
Question
You are tasked with using the 'PARSE DOCUMENT' function in Snowflake to extract key information (name, address, phone number) from a large collection of scanned invoices stored as PDF files in an AWS S3 bucket. The invoices have varying formats and quality. Which of the following approaches would be MOST effective to structure the extracted data for analysis?
Options
- AUse `PARSE DOCUMENT with default settings and load the raw JSON output into a VARIANT
- BUse `PARSE DOCUMENT with a pre-defined JSON schema to enforce a rigid structure on the
- CCreate a custom UDF (User-Defined Function) that calls 'PARSE_DOCUMENT and then uses
- DEmploy a combination of 'PARSE DOCUMENT and Snowflake's external functions to integrate
- EDirectly load PDF files into a relational table's TEXT column and write SQL queries utilizing LIKE
Explanation
Option C provides the most robust and flexible approach. Given the varying formats and quality of the invoices, a pre-defined JSON schema (option B) is unlikely to work effectively. Loading raw JSON into a VARIANT column (option A) requires extensive post-processing. Option D, while potentially effective, introduces the complexity and cost of a third-party OCR service. And MAX_FILE_SIZE parameter controls the maximum size, in bytes, of a single uncompressed file that can be loaded from the stage. Option E is not a scalable and efficient approach.
Topics
Community Discussion
No community discussion yet for this question.