nerdexam
SnowflakeSnowflake

DEA-C02 · Question #143

DEA-C02 Question #143: Real Exam Question with Answer & Explanation

The correct answer is A: df_summary = df_orders.group_by('username').agg(count('id').alias('orders_count'), sum('price').alias('revenue'), sum('tax').alias('total_tax')).select('username', 'orders_count', 'revenue', 'total_tax').filter(col('orders_count')>1).df_summary.show(). Option A correctly follows Snowpark's DataFrame API method-chaining pattern: group_by() → agg() → select() → filter(). The agg() method is the proper way to apply multiple aggregation functions simultaneously, and .alias() is the correct Snowpark Python method to rename output co

Data Transformation

Question

A Data Engineer needs to rewrite SQL code into Snowpark code using the Snowpark Python API. Data is stored in a DataFrame, df_orders. How can this SQL query be rewritten so that it will run in Snowpark? A. B. C. D.

Options

  • Adf_summary = df_orders.group_by('username').agg(count('id').alias('orders_count'), sum('price').alias('revenue'), sum('tax').alias('total_tax')).select('username', 'orders_count', 'revenue', 'total_tax').filter(col('orders_count')>1).df_summary.show()
  • Bdf_summary = df_orders.select('username', 'orders_count', 'revenue', 'total_tax', count('id').as('orders_count'), sum('price').as('revenue'), sum('tax').as('total_tax')) .group_by('username').sort('orders_count').desc().filter(having('orders_count')>1).df_summary.show()
  • Cdf_summary = select(D.count('id').AS('orders_count'), D.sum('price').AS('revenue'), D.sum('tax').AS('total_tax')) .from_('orders') .group_by('username').having('orders_count'>1) .sort('orders_count').desc() df_summary.show()
  • Ddf_summary = df_orders.group_by('username').count('id').alias('orders_count'), sum('price').alias('revenue'), sum('tax').alias('total_tax') .select('username', 'orders_count', 'revenue', 'total_tax') .sort('orders_count').desc().having(having('orders_count')>1).df_summary.show()

Explanation

Option A correctly follows Snowpark's DataFrame API method-chaining pattern: group_by()agg()select()filter(). The agg() method is the proper way to apply multiple aggregation functions simultaneously, and .alias() is the correct Snowpark Python method to rename output columns.

Option B fails because it tries to select() columns like orders_count and revenue before they've been created by aggregation - the order of operations is reversed. It also incorrectly uses .as() (Java/Scala syntax) instead of .alias(), and misuses having() as a standalone filter method.

Option C uses a pseudo-SQL syntax (D.count(), .AS(), .from_('orders')) that doesn't correspond to any valid Snowpark Python API - it resembles SQLAlchemy or raw SQL, not the DataFrame API.

Option D breaks the method chain immediately after group_by() by calling .count() directly on it, which isn't valid Snowpark syntax; agg() is required to bundle multiple aggregations. It also uses the non-existent .having() method on a Snowpark DataFrame.

Memory tip: In Snowpark, remember GASF - Group_by, Agg, Select, Filter - and that column renaming always uses .alias(), never .as().

Topics

#Snowpark Python API#DataFrame Transformations#Aggregation#Filtering

Community Discussion

No community discussion yet for this question.

Full DEA-C02 PracticeBrowse All DEA-C02 Questions