SOL-C01 · Question #3
SOL-C01 Question #3: Real Exam Question with Answer & Explanation
The correct answer is C: Create a Snowflake Python User-Defined Function (UDF) that encapsulates the calculation logic. Option C, creating a Snowflake Python IJDF and using it in a SELECT statement within a CREATE TABLE AS SELECT statement, is the most efficient and scalable approach. Snowflake IJDFs allow you to execute Python code directly within the Snowflake engine, leveraging Snowflake's dist
Question
You are using a Snowflake Notebook to perform data analysis on a large dataset. As part of your analysis, you need to create a custom Python function that calculates a complex metric based on multiple columns in a Snowflake table. You want to apply this function to each row of the table and store the results in a new column. Which of the following approaches is the MOST efficient and scalable way to achieve this using Snowflake and Python?
Options
- ALoad the entire Snowflake table into a Pandas DataFrame, apply the Python function to each row
- BUse the '%%osql' magic command to execute a series of SQL UPDATE' statements that call the
- CCreate a Snowflake Python User-Defined Function (UDF) that encapsulates the calculation logic
- DIterate over the rows of the Snowflake table using the Snowflake Connector for Python, call the
- ECreate a stored procedure in Snowflake that runs the logic in a separate environment.
Explanation
Option C, creating a Snowflake Python IJDF and using it in a SELECT statement within a CREATE TABLE AS SELECT statement, is the most efficient and scalable approach. Snowflake IJDFs allow you to execute Python code directly within the Snowflake engine, leveraging Snowflake's distributed processing capabilities. This avoids the overhead of transferring large amounts of data between Snowflake and the Python environment in the Notebook. Loading the entire table into a Pandas DataFrame (A) is not scalable for large datasets and can lead to memory issues. Using %%osql' with UPDATE statements (B) would be very slow due to the row-by-row updates. Iterating over rows using the Snowflake Connector (D) is also inefficient and not scalable. Option E is incorrect because it doesn't directly use Python code from the Notebook.
Topics
Community Discussion
No community discussion yet for this question.