A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature f

Sign in or unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to reveal the answer and full explanation for question #80. The question stem and answer options stay visible for context.

Streaming Data Processing

Question

A junior data engineer has been asked to develop a streaming data pipeline with a grouped aggregation using DataFrame df. The pipeline needs to calculate the average humidity and average temperature for each non-overlapping five-minute interval. Incremental state information should be maintained for 10 minutes for late-arriving data. Streaming DataFrame df has the following schema:

"device_id INT, event_time TIMESTAMP, temp FLOAT, humidity FLOAT" Code block:

Choose the response that correctly fills in the blank within the code block to complete this task.

Exhibit

Options

AwithWatermark("event_time", "10 minutes")
BawaitArrival("event_time", "10 minutes")
Cawait("event_time + `10 minutes'")
DslidingWindow("event_time", "10 minutes")
EdelayWrite("event_time", "10 minutes")

Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL to see the answer

You've previewed enough free CERTIFIED-DATA-ENGINEER-PROFESSIONAL questions. Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL for full answers, explanations, the timed quiz mode, progress tracking, and the master PDF. Question stem and options stay visible so you can still see what's on the exam.

Unlock CERTIFIED-DATA-ENGINEER-PROFESSIONAL - $49.99 / 30 days Sign in

Topics

#Spark Structured Streaming#Watermarking#Late-arriving data#Event time processing

Full CERTIFIED-DATA-ENGINEER-PROFESSIONAL Practice