PROFESSIONAL-DATA-ENGINEER Exam Questions
357 real PROFESSIONAL-DATA-ENGINEER exam questions with expert-verified answers and explanations. Page 7 of 8.
- Question #333Ensuring solution quality
You have important legal hold documents in a Cloud Storage bucket. You need to ensure that these documents are not deleted or modified. What should you do?
Cloud StorageData RetentionImmutabilityLegal Hold - Question #334Designing data processing systems
You are designing a data warehouse in BigQuery to analyze sales data for a telecommunication service provider. You need to create a data model for customers, products, and subscrip...
BigQueryData ModelingHistorical DataCost Optimization - Question #335Designing data processing systems
You are deploying a batch pipeline in Dataflow. This pipeline reads data from Cloud Storage, transforms the data, and then writes the data into BigQuery. The security team has enab...
Dataflow NetworkingPrivate Google AccessInternal IP AddressesOrganizational Policy - Question #336Designing data processing systems
You are running a Dataflow streaming pipeline, with Streaming Engine and Horizontal Autoscaling enabled. You have set the maximum number of workers to 1000. The input of your pipel...
Dataflow performancePipeline optimizationAutoscalingDataflow fusion - Question #337Designing data processing systems
You have an Oracle database deployed in a VM as part of a Virtual Private Cloud (VPC) network. You want to replicate and continuously synchronize 50 tables to BigQuery. You want to...
Data ReplicationChange Data Capture (CDC)Google Cloud DatastreamBigQuery - Question #338Designing data processing systems
You are deploying an Apache Airflow directed acyclic graph (DAG) in a Cloud Composer 2 instance. You have incoming files in a Cloud Storage bucket that the DAG processes, one file...
Cloud ComposerEvent-driven ArchitectureCloud FunctionsAirflow REST API - Question #339Designing data processing systems
You are planning to use Cloud Storage as part of your data lake solution. The Cloud Storage bucket will contain objects ingested from external systems. Each object will be ingested...
Cloud StorageAutoclassCost OptimizationData Lake Architecture - Question #341Designing data processing systems
Your business users need a way to clean and prepare data before using the data for analysis. Your business users are less technically savvy and prefer to work with graphical user i...
DataprepConnected SheetsData preparationSelf-service analytics - Question #342Designing data processing systems
You have two projects where you run BigQuery jobs: - One project runs production jobs that have strict completion time SLAs. These are high priority jobs that must have the require...
BigQuery ReservationsBigQuery PricingAutoscalingResource Allocation - Question #343Designing data processing systems
You want to migrate your existing Teradata data warehouse to BigQuery. You want to move the historical data to BigQuery by using the most efficient method that requires the least a...
Data MigrationBigQuery Data Transfer ServiceTeradataData Warehouse - Question #344Ensuring solution quality
You are on the data governance team and are implementing security requirements. You need to encrypt all your data in BigQuery by using an encryption key managed by your team. You m...
Data EncryptionKey ManagementBigQuery SecurityCloud EKM - Question #346Building and operationalizing data processing systems
You are running your BigQuery project in the on-demand billing model and are executing a change data capture (CDC) process that ingests data. The CDC process loads 1 GB of data eve...
BigQuery ReservationsCost ManagementChange Data CaptureData Ingestion - Question #347Ensuring solution quality
You are designing a fault-tolerant architecture to store data in a regional BigQuery dataset. You need to ensure that your application is able to recover from a corruption event in...
BigQueryData RecoveryFault ToleranceTime Travel - Question #348Designing data processing systems
You are building a streaming Dataflow pipeline that ingests noise level data from hundreds of sensors placed near construction sites across a city. The sensors measure noise level...
Dataflow WindowingStreaming Data ProcessingTumbling WindowsLate Data Handling - Question #349Designing data processing systems
You are creating a data model in BigQuery that will hold retail transaction data. Your two largest tables, sales_transaction_header and sales_transaction_line, have a tightly coupl...
BigQueryData ModelingDenormalizationPerformance Optimization - Question #350
You created a new version of a Dataflow streaming data ingestion pipeline that reads from Pub/Sub and writes to BigQuery. The previous version of the pipeline that runs in producti...
- Question #351
Your organization's data assets are stored in BigQuery, Pub/Sub, and a PostgreSQL instance running on Compute Engine. Because there are multiple domains and diverse teams using the...
- Question #352Building and operationalizing data processing systems
You need to create a SQL pipeline. The pipeline runs an aggregate SQL transformation on a BigQuery table every two hours and appends the result to another existing BigQuery table....
BigQueryCloud ComposerData PipelinesError Handling - Question #353Building and operationalizing data processing systems
You are monitoring your organization's data lake hosted on BigQuery. The ingestion pipelines read data from Pub/Sub and write the data into tables on BigQuery. After a new version...
Data ingestionBigQueryTroubleshootingDuplicate data - Question #354Building and operationalizing data processing systems
You have a BigQuery dataset named "customers". All tables will be tagged by using a Data Catalog tag template named "gdpr". The template contains one mandatory field, "has_sensitiv...
BigQueryData CatalogIAMData Security - Question #356Building and operationalizing data processing systems
You have a BigQuery table that ingests data directly from a Pub/Sub subscription. The ingested data is encrypted with a Google-managed encryption key. You need to meet a new organi...
BigQueryCustomer-Managed Encryption Keys (CMEK)Data EncryptionData Migration - Question #357Designing data processing systems
You created an analytics environment on Google Cloud so that your data scientist team can explore data without impacting the on-premises Apache Hadoop solution. The data in the on-...
BigQuery External TablesCloud StorageHadoop MigrationCost Optimization - Question #358Designing data processing systems
You are designing a Dataflow pipeline for a batch processing job. You want to mitigate multiple zonal failures at job submission time. What should you do?
DataflowFault ToleranceRegional DeploymentJob Configuration - Question #359Designing data processing systems
You are designing a real-time system for a ride hailing app that identifies areas with high demand for rides to effectively reroute available drivers to meet the demand. The system...
DataflowHopping WindowsReal-time Data ProcessingMemorystore - Question #361
You want to store your team's shared tables in a single dataset to make data easily accessible to various analysts. You want to make this data readable but unmodifiable by analysts...
- Question #362
You are running a streaming pipeline with Dataflow and are using hopping windows to group the data as the data arrives. You noticed that some data is arriving late but is not being...
- Question #363Designing data processing systems
You work for a large ecommerce company. You store your customer's order data in Bigtable. You have a garbage collection policy set to delete the data after 30 days and the number o...
BigtableGarbage CollectionQuery FilteringData Retention - Question #364Designing data processing systems
You are using a Dataflow streaming job to read messages from a message bus that does not support exactly-once delivery. Your job then applies some transformations, and loads the re...
BigQuery Storage Write APIExactly-once semanticsDataflow streamingBigQuery table location - Question #365
You have created an external table for Apache Hive partitioned data that resides in a Cloud Storage bucket, which contains a large number of files. You notice that queries against...
- Question #366Designing data processing systems
You have a network of 1000 sensors. The sensors generate time series data: one metric per sensor per second, along with a timestamp. You already have 1 TB of data, and expect the d...
BigtableBigQueryTime Series DataData Storage Design - Question #367Designing data processing systems
You have 100 GB of data stored in a BigQuery table. This data is outdated and will only be accessed one or two times a year for analytics with SQL. For backup purposes, you want to...
BigQuery exportCloud Storage ArchiveData lifecycle managementCost optimization - Question #368Designing data processing systems
You have thousands of Apache Spark jobs running in your on-premises Apache Hadoop cluster. You want to migrate the jobs to Google Cloud. You want to use managed services to run you...
Spark migrationDataprocManaged servicesHadoop migration - Question #369Designing data processing systems
You are administering shared BigQuery datasets that contain views used by multiple teams in your organization. The marketing team is concerned about the variability of their monthl...
BigQuery BillingBigQuery ReservationsCost ManagementResource Allocation - Question #370Designing data processing systems
You are part of a healthcare organization where data is organized and managed by respective data owners in various storage services. As a result of this decentralized ecosystem, di...
Google DataplexData governanceData discoveryData lineage - Question #371Building and operationalizing data processing systems
You have data located in BigQuery that is used to generate reports for your company. You have noticed some weekly executive report fields do not correspond to format according to c...
Data NormalizationCloud Data FusionNo-code Data TransformationRecurring Data Jobs - Question #372Designing data processing systems
You are designing a messaging system by using Pub/Sub to process clickstream data with an event-driven consumer app that relies on a push subscription. You need to configure the me...
Pub/SubSubscription Retry PolicyDead Letter QueuesMessaging System Design - Question #373Designing data processing systems
You designed a data warehouse in BigQuery to analyze sales data. You want a self-serving, low-maintenance, and cost- effective solution to share the sales dataset to other business...
BigQueryData SharingAnalytics HubCost Optimization - Question #374Designing data processing systems
You have terabytes of customer behavioral data streaming from Google Analytics into BigQuery daily. Your customers' information, such as their preferences, is hosted on a Cloud SQL...
Data IntegrationChange Data CaptureBigQueryCloud SQL - Question #375
Your organization is modernizing their IT services and migrating to Google Cloud. You need to organize the data that will be stored in Cloud Storage and BigQuery. You need to enabl...
- Question #376Designing data processing systems
You work for a large ecommerce company. You are using Pub/Sub to ingest the clickstream data to Google Cloud for analytics. You observe that when a new subscriber connects to an ex...
Pub/SubMessage Retention PolicyData Ingestion - Question #377Building and operationalizing data processing systems
You are designing the architecture to process your data from Cloud Storage to BigQuery by using Dataflow. The network team provided you with the Shared VPC network and subnetwork t...
DataflowShared VPCIAM RolesNetworking - Question #378Designing data processing systems
Your infrastructure team has set up an interconnect link between Google Cloud and the on-premises network. You are designing a high-throughput streaming pipeline to ingest data in...
Streaming ingestionDataflowKafkaBigQuery - Question #379
You migrated your on-premises Apache Hadoop Distributed File System (HDFS) data lake to Cloud Storage. The data scientist team needs to process the data by using Apache Spark and S...
- Question #381Designing data processing systems
You have an upstream process that writes data to Cloud Storage. This data is then read by an Apache Spark job that runs on Dataproc. These jobs are run in the us-central1 region, b...
Cloud StorageDisaster RecoveryRPOData Latency - Question #382Designing data processing systems
You currently have transactional data stored on-premises in a PostgreSQL database. To modernize your data environment, you want to run transactional workloads and support analytics...
Database MigrationAlloyDB for PostgreSQLHTAPManaged PostgreSQL - Question #383Designing data processing systems
You are architecting a data transformation solution for BigQuery. Your developers are proficient with SQL and want to use the ELT development technique. In addition, your developer...
DataformELTBigQueryData Transformation - Question #385Building and operationalizing data processing systems
You are managing a Dataplex environment with raw and curated zones. A data engineering team is uploading JSON and CSV files to a bucket asset in the curated zone but the files are...
DataplexData DiscoveryData ZonesData Cataloging - Question #386Designing data processing systems
You have a table that contains millions of rows of sales data, partitioned by date. Various applications and users query this data many times a minute. The query requires aggregati...
Materialized ViewsQuery OptimizationData AggregationData Freshness - Question #387Designing data processing systems
Your organization uses a multi-cloud data storage strategy, storing data in Cloud Storage, and data in Amazon Web Services' (AWS) S3 storage buckets. All data resides in US regions...
BigQuery OmniBigLake TablesMulti-cloudData Governance - Question #388Designing data processing systems
You are preparing an organization-wide dataset. You need to preprocess customer data stored in a restricted bucket in Cloud Storage. The data will be used to create consumer analys...
Data PrivacyData MaskingCloud DLPDataflow