PROFESSIONAL-DATA-ENGINEER Exam Questions
357 real PROFESSIONAL-DATA-ENGINEER exam questions with expert-verified answers and explanations. Page 5 of 8.
- Question #223
You need to create a new transaction table in Cloud Spanner that stores product sales data. You are deciding what to use as a primary key. From a performance perspective, which str...
- Question #224
Data Analysts in your company have the Cloud IAM Owner role assigned to them in their projects to allow them to work with multiple GCP products in their projects. Your organization...
- Question #225
Each analytics team in your organization is running BigQuery jobs in their own projects. You want to enable each team to monitor slot usage within their projects. What should you d...
- Question #226
You are operating a streaming Cloud Dataflow pipeline. Your engineers have a new version of the pipeline with a different windowing algorithm and triggering strategy. You want to u...
- Question #228Building and operationalizing data processing systems
You receive data files in CSV format monthly from a third party. You need to cleanse this data, but every third month the schema of the files changes. Your requirements for impleme...
Data cleansingData transformationCloud DataprepScheduled jobs - Question #229
You want to migrate an on-premises Hadoop system to Cloud Dataproc. Hive is the primary tool in use, and the data format is Optimized Row Columnar (ORC). All ORC files have been su...
- Question #230Building and operationalizing data processing systems
You are implementing several batch jobs that must be executed on a schedule. These jobs have many interdependent steps that must be executed in a specific order. Portions of the jo...
Workflow orchestrationCloud ComposerApache AirflowBatch processing - Question #231
You work for a shipping company that has distribution centers where packages move on delivery lines to route them properly. The company wants to add cameras to the delivery lines t...
- Question #232
You are migrating your data warehouse to BigQuery. You have migrated all of your data into tables in a dataset. Multiple users from your organization will be using the data. They s...
- Question #233
You want to build a managed Hadoop system as your data lake. The data transformation process is composed of a series of Hadoop jobs executed in sequence. To accomplish the design o...
- Question #234Operationalizing machine learning models
You work for an advertising company, and you've developed a Spark ML model to predict click-through rates at advertisement blocks. You've been developing everything at your on-prem...
Spark MLData MigrationCloud DataprocBigQuery Integration - Question #235Designing data processing systems
You work for a global shipping company. You want to train a model on 40 TB of data to predict which ships in each geographic region are likely to cause delivery delays on any given...
BigQueryGeospatial ProcessingMachine LearningData Warehousing - Question #236
You operate an IoT pipeline built around Apache Kafka that normally receives around 5000 messages per second. You want to use Google Cloud Platform to create an alert as soon as th...
- Question #237
You plan to deploy Cloud SQL using MySQL. You need to ensure high availability in the event of a zone failure. What should you do?
- Question #238
Your company is selecting a system to centralize data ingestion and delivery. You are considering messaging and data integration systems to address the requirements. The key requir...
- Question #239
You are planning to migrate your current on-premises Apache Hadoop deployment to the cloud. You need to ensure that the deployment is as fault-tolerant and cost-effective as possib...
- Question #240
Your team is working on a binary classification problem. You have trained a support vector machine (SVM) classifier with default parameters, and received an area under the Curve (A...
- Question #242
You need to choose a database for a new project that has the following requirements: Fully managed Able to automatically scale up Transactionally consistent Able to scale up to 6 T...
- Question #243Building and operationalizing data processing systems
You are working on a linear regression model on BigQuery ML to predict a customer's likelihood of purchasing your company's products. Your model uses a city name variable as a key...
Cloud Data FusionFeature EngineeringCategorical DataBigQuery ML - Question #244
You work for a large bank that operates in locations throughout North America. You are setting up a data storage system that will handle bank account transactions. You require ACID...
- Question #246
Your company currently runs a large on-premises cluster using Spark, Hive, and HDFS in a colocation facility. The cluster is designed to accommodate peak usage on the system; howev...
- Question #247Designing data processing systems
You work for a financial institution that lets customers register online. As new customers register, their user data is sent to Pub/Sub before being ingested into BigQuery. For sec...
Data Loss Prevention (DLP)Data SecurityPII RedactionTokenization - Question #248
You are migrating a table to BigQuery and are deciding on the data model. Your table stores information related to purchases made across several store locations and includes inform...
- Question #249Building and operationalizing data processing systems
You are updating the code for a subscriber to a Pub/Sub feed. You are concerned that upon deployment the subscriber may erroneously acknowledge messages, leading to message loss. Y...
Pub/SubSnapshotsError RecoveryMessage Loss - Question #250
You work for a large real estate firm and are preparing 6 TB of home sales data to be used for machine learning. You will use SQL to transform the data and use BigQuery ML to creat...
- Question #251Designing data processing systems
You are analyzing the price of a company's stock. Every 5 seconds, you need to compute a moving average of the past 30 seconds' worth of data. You are reading data from Pub/Sub and...
DataflowApache BeamWindowingStreaming Analytics - Question #252
You are designing a pipeline that publishes application events to a Pub/Sub topic. Although message ordering is not important, you need to be able to aggregate events across disjoi...
- Question #253Operationalizing machine learning models
You work for a large financial institution that is planning to use Dialogflow to create a chatbot for the company's mobile app. You have reviewed old chat logs and lagged each conv...
DialogflowChatbot automationIntent prioritizationCustomer service optimization - Question #254
You want to rebuild your batch pipeline for structured data on Google Cloud. You are using PySpark to conduct data transformations at scale, but your pipelines are taking over twel...
- Question #255Designing data processing systems
You are building a teal-lime prediction engine that streams files, which may contain Pll (personal identifiable information) data, into Cloud Storage and eventually into BigQuery....
Cloud DLPPII MaskingData De-identificationReferential Integrity - Question #256
Your company is implementing a data warehouse using BigQuery, and you have been tasked with designing the data model. You move your on-premises sales data warehouse with a star dat...
- Question #257
You are using Bigtable to persist and serve stock market data for each of the major indices. To serve the trading application, you need to access only the most recent stock prices...
- Question #258
You are testing a Dataflow pipeline to ingest and transform text files. The files are compressed gzip, errors are written to a dead-letter queue, and you are using SideInputs to jo...
- Question #260
You are migrating your data warehouse to Google Cloud and decommissioning your on-premises data center. Because this is a priority for your company, you know that bandwidth will be...
- Question #261
You need to give new website users a globally unique identifier (GUID) using a service that takes in data points and returns a GUID. This data is sourced from both internal and ext...
- Question #262Designing data processing systems
You are migrating an application that tracks library books and information about each book, such as author or year published, from an on-premises data warehouse to BigQuery. In you...
BigQuery Schema DesignDenormalizationNested Data TypesQuery Optimization - Question #263Building and operationalizing data processing systems
You have uploaded 5 years of log data to Cloud Storage. A user reported that some data points in the log data are outside of their expected ranges, which indicates errors. You need...
Cloud DataflowData QualityETLBatch Processing - Question #264Designing data processing systems
An aerospace company uses a proprietary data format to store its night data. You need to connect this new data source to BigQuery and stream the data into BigQuery. You want to eff...
Data streamingDataflowCustom connectorsBigQuery ingestionProprietary data - Question #265
You are using BigQuery and Data Studio to design a customer-facing dashboard that displays large quantities of aggregated data. You expect a high volume of concurrent users. You ne...
- Question #266
You need ads data to serve Al models and historical data tor analytics longtail and outlier data points need to be identified. You want to cleanse the data n near- reel time before...
- Question #267
The Development and External teams have the project viewer Identity and Access Management (IAM) role in a folder named Visualization. You want the Development Team to be able to re...
- Question #268
A TensorFlow machine learning model on Compute Engine virtual machines (n2-standard -32) takes two days to complete framing. The model has custom TensorFlow operations that must ru...
- Question #270Designing data processing systems
You are designing a pipeline that publishes application events to a Pub/Sub topic. You need to aggregate events across hourly intervals before loading the results to BigQuery for a...
Streaming data processingDataflowPub/SubWindowing - Question #271
Your company is in the process of migrating its on-premises data warehousing solutions to BigQuery. The existing data warehouse uses trigger-based change data capture (CDC) to appl...
- Question #272
You are implementing workflow pipeline scheduling using open source-based tools and Google Kubernetes Engine (GKE). You want to use a Google managed service to simplify and automat...
- Question #273
You are designing a system that requires an ACID-compliant database. You must ensure that the system requires minimal human intervention in case of a failure. What should you do?
- Question #275
You have 15 TB of data in your on-premises data center that you want to transfer to Google Cloud. Your data changes weekly and is stored in a POSIX-compliant source. The network op...
- Question #276Designing data processing systems
A live TV show asks viewers to cast votes using their mobile phones. The event generates a large volume of data during a 3 minute period. You are in charge of the Voting restructur...
Stream ProcessingReal-time DataScalable Data IngestionData Warehousing - Question #277Designing data processing systems
Your company wants to be able to retrieve large result sets of medical information from your current system, which has over 10 TBs in the database, and store the data in new tables...
BigQueryData WarehousingScalable AnalyticsCost Optimization - Question #278Designing data processing systems
Government regulations in the banking industry mandate the protection of clients' personally identifiable information (PII). Your company requires PII to be access controlled, encr...
PII ProtectionCloud StorageIAMData Compliance