PROFESSIONAL-DATA-ENGINEER Exam Questions
357 real PROFESSIONAL-DATA-ENGINEER exam questions with expert-verified answers and explanations. Page 4 of 8.
- Question #167
After migrating ETL jobs to run on BigQuery, you need to verify that the output of the migrated jobs is the same as the output of the original. You've loaded a table containing the...
- Question #168Designing data processing systems
You are a head of BI at a large enterprise company with multiple business units that each have different priorities and budgets. You use on-demand pricing for BigQuery with a quota...
BigQuery Slot ManagementBigQuery Flat-rate PricingResource AllocationEnterprise BI - Question #170
You've migrated a Hadoop job from an on-prem cluster to dataproc and GCS. Your Spark job is a complicated analytical workload that consists of many shuffing operations and initial...
- Question #171
Your team is responsible for developing and maintaining ETLs in your company. One of your Dataflow jobs is failing because of some errors in the input data, and you need to improve...
- Question #172
You're training a model to predict housing prices based on an available dataset with real estate properties. Your plan is to train a fully connected neural net, and you've discover...
- Question #173
You are deploying MariaDB SQL databases on GCE VM Instances and need to configure monitoring and alerting. You want to collect metrics including network connections, disk IO and re...
- Question #174Operationalizing machine learning models
You work for a bank. You have a labelled dataset that contains information on already granted loan application and whether these applications have been defaulted. You have been ask...
Predictive ModelingCredit RiskSupervised LearningMachine Learning Models - Question #175Designing data processing systems
You need to migrate a 2TB relational database to Google Cloud Platform. You do not have the resources to significantly refactor the application that uses this database and cost to...
Cloud SQLDatabase MigrationRelational DatabaseCost Optimization - Question #176
You're using Bigtable for a real-time application, and you have a heavy load that is a mix of read and writes. You've recently identified an additional use case and need to perform...
- Question #177Designing data processing systems
You are designing an Apache Beam pipeline to enrich data from Cloud Pub/Sub with static reference data from BigQuery. The reference data is small enough to fit in memory on a singl...
Apache BeamStreaming pipelinesSide inputsData enrichment - Question #178
You have a data pipeline that writes data to Cloud Bigtable using well-designed row keys. You want to monitor your pipeline to determine when to increase the size of you Cloud Bigt...
- Question #179
You want to analyze hundreds of thousands of social media posts daily at the lowest cost and with the fewest steps. You have the following requirements: You will batch-load the pos...
- Question #180Designing data processing systems
You store historic data in Cloud Storage. You need to perform analytics on the historic data. You want to use a solution to detect invalid data entries and perform data transformat...
Data CleaningData TransformationDataprepVisual ETL - Question #181Building and operationalizing data processing systems
Your company needs to upload their historic data to Cloud Storage. The security rules don't allow access from external IPs to their on-premises resources. After an initial upload,...
Cloud StorageData TransfergsutilHybrid Connectivity - Question #182
You have a query that filters a BigQuery table using a WHERE clause on timestamp and ID columns. By using bq query -dry_run you learn that the query triggers a full scan of the tab...
- Question #183
You have a requirement to insert minute-resolution data from 50,000 sensors into a BigQuery table. You expect significant growth in data volume and need the data to be available wi...
- Question #185Designing data processing systems
You need to create a near real-time inventory dashboard that reads the main inventory tables in your BigQuery data warehouse. Historical inventory data is stored as inventory balan...
BigQuery streamingReal-time analyticsData warehousing patternsNear real-time inventory - Question #186
You have a data stored in BigQuery. The data in the BigQuery dataset must be highly available. You need to define a storage, backup, and recovery strategy of this data that minimiz...
- Question #187Building and operationalizing data processing systems
You used Cloud Dataprep to create a recipe on a sample of data in a BigQuery table. You want to reuse this recipe on a daily upload of data with the same schema, after the load job...
Cloud DataprepCloud DataflowCloud ComposerData Pipeline Orchestration - Question #188
You want to automate execution of a multi-step data pipeline running on Google Cloud. The pipeline includes Cloud Dataproc and Cloud Dataflow jobs that have multiple dependencies o...
- Question #189
You are managing a Cloud Dataproc cluster. You need to make a job run faster while minimizing costs, without losing work in progress on your clusters. What should you do?
- Question #190Designing data processing systems
You work for a shipping company that uses handheld scanners to read shipping labels. Your company has strict data privacy standards that require scanners to only transmit recipient...
Cloud Data Loss Prevention (DLP)Serverless FunctionsData PrivacyScalable Data Processing - Question #191
You have developed three data processing jobs. One executes a Cloud Dataflow pipeline that transforms data uploaded to Cloud Storage and writes results to BigQuery. The second inge...
- Question #192
You have Cloud Functions written in Node.js that pull messages from Cloud Pub/Sub and send the data to BigQuery. You observe that the message processing rate on the Pub/Sub topic i...
- Question #193
You are creating a new pipeline in Google Cloud to stream IoT data from Cloud Pub/Sub through Cloud Dataflow to BigQuery. While previewing the data, you notice that roughly 2% of t...
- Question #194Designing data processing systems
You have historical data covering the last three years in BigQuery and a data pipeline that delivers new data to BigQuery daily. You have noticed that when the Data Science team ru...
BigQuery PartitioningBigQuery Cost OptimizationData Warehouse DesignPerformance Tuning - Question #195
You operate a logistics company, and you want to improve event delivery reliability for vehicle-based sensors. You operate small data centers around the world to capture these even...
- Question #196
You are a retailer that wants to integrate your online sales capabilities with different in-home assistants, such as Google Home. You need to interpret customer voice commands and...
- Question #197
Your company has a hybrid cloud initiative. You have a complex data pipeline that moves data between cloud provider services and leverages services from each of the cloud providers...
- Question #198
You use a dataset in BigQuery for analysis. You want to provide third-party companies with access to the same dataset. You need to keep the costs of data sharing low and ensure tha...
- Question #200
You are designing a data processing pipeline. The pipeline must be able to scale automatically as load increases. Messages must be processed at least once and must be ordered withi...
- Question #201
You need to set access to BigQuery for different departments within your company. Your solution should comply with the following requirements: Each department should have access on...
- Question #202
You operate a database that stores stock trades and an application that retrieves average stock price for a given company over an adjustable window of time. The data is stored in C...
- Question #203Building and operationalizing data processing systems
You are operating a Cloud Dataflow streaming pipeline. The pipeline aggregates events from a Cloud Pub/ Sub subscription source, within a window, and sinks the resulting aggregatio...
Cloud DataflowCloud Pub/SubCloud MonitoringStreaming Data Pipelines - Question #204Designing data processing systems
You currently have a single on-premises Kafka cluster in a data center in the us-east region that is responsible for ingesting messages from IoT devices globally. Because large par...
IoTMessagingStream ProcessingScalabilityCloud Native Architecture - Question #205
You decided to use Cloud Datastore to ingest vehicle telemetry data in real time. You want to build a storage system that will account for the long-term data growth, while keeping...
- Question #206
You need to create a data pipeline that copies time-series transaction data so that it can be queried from within BigQuery by your data science team for analysis. Every hour, thous...
- Question #207
You are designing a cloud-native historical data processing system to meet the following conditions: The data being analyzed is in CSV, Avro, and PDF formats and will be accessed b...
- Question #209
You work for a manufacturing company that sources up to 750 different components, each from a different supplier. You've collected a labeled dataset that has on average 1000 exampl...
- Question #210
You are working on a niche product in the image recognition domain. Your team has developed a model that is dominated by custom C++ TensorFlow ops your team has implemented. These...
- Question #211
You work on a regression problem in a natural language processing domain, and you have 100M labeled exmaples in your dataset. You have randomly shuffled your data and split your da...
- Question #212Designing data processing systems
You use BigQuery as your centralized analytics platform. New data is loaded every day, and an ETL pipeline modifies the original data and prepares it for the final users. This ETL...
BigQuery data organizationData backup strategyCost optimizationDisaster recovery - Question #213
The marketing team at your organization provides regular updates of a segment of your customer dataset. The marketing team has given you a CSV with 1 million
- Question #214
As your organization expands its usage of GCP, many teams have started to create their own projects. Projects are further multiplied to accommodate different stages of deployments...
- Question #215Designing data processing systems
Your United States-based company has created an application for assessing and responding to user actions. The primary table's data volume grows by 250,000 records per second. Many...
Cloud SpannerGlobal ConsistencyHigh Transactional ThroughputDatabase Selection - Question #216Operationalizing machine learning models
A data scientist has created a BigQuery ML model and asks you to create an ML pipeline to serve predictions. You have a REST API application with the requirement to serve predictio...
BigQuery MLML Prediction ServingCloud DataflowLow Latency Serving - Question #218Designing data processing systems
You are building a new application that you need to collect data from in a scalable way. Data arrives continuously from the application throughout the day, and you expect to genera...
Stream IngestionData Processing PipelinesData WarehousingData Lake - Question #219Building and operationalizing data processing systems
You are running a pipeline in Cloud Dataflow that receives messages from a Cloud Pub/Sub topic and writes the results to a BigQuery dataset in the EU. Currently, your pipeline is l...
Dataflow optimizationPipeline scalingPerformance tuningResource management - Question #220
You have a data pipeline with a Cloud Dataflow job that aggregates and writes time series metrics to Cloud Bigtable. This data feeds a dashboard used by thousands of users across t...
- Question #221
You have several Spark jobs that run on a Cloud Dataproc cluster on a schedule. Some of the jobs run in sequence, and some of the jobs run concurrently. You need to automate this p...