PROFESSIONAL-DATA-ENGINEER Exam Questions
357 real PROFESSIONAL-DATA-ENGINEER exam questions with expert-verified answers and explanations. Page 3 of 8.
- Question #112
The YARN ResourceManager and the HDFS NameNode interfaces are available on a Cloud Dataproc cluster ____.
- Question #113
Which of these is NOT a way to customize the software on Dataproc cluster instances?
- Question #114
In order to securely transfer web traffic data from your computer's web browser to the Cloud Dataproc cluster you should use a(n) _____.
- Question #115
All Google Cloud Bigtable client requests go through a front-end server ______ they are sent to a Cloud Bigtable node.
- Question #116
What is the general recommendation when designing your row keys for a Cloud Bigtable schema?
- Question #117
Which of the following statements is NOT true regarding Bigtable access roles?
- Question #118
For the best possible performance, what is the recommended zone for your Compute Engine instance and Cloud Bigtable instance?
- Question #119
Which row keys are likely to cause a disproportionate number of reads and/or writes on a particular node in a Bigtable cluster (select 2 answers)?
- Question #121
Which is not a valid reason for poor Cloud Bigtable performance?
- Question #122
Which is the preferred method to use to avoid hotspotting in time series data in Bigtable?
- Question #123
When you design a Google Cloud Bigtable schema it is recommended that you _________.
- Question #124
Which of the following is NOT a valid use case to select HDD (hard disk drives) as the storage for Google Cloud Bigtable?
- Question #126
When you store data in Cloud Bigtable, what is the recommended minimum amount of stored data?
- Question #127
If you're running a performance test that depends upon Cloud Bigtable, all the choices except one below are recommended steps. Which is NOT a recommended step to follow?
- Question #128
Cloud Bigtable is a recommended option for storing very large amounts of ____________________________?
- Question #129
Google Cloud Bigtable indexes a single value in each row. This value is called the _______.
- Question #131
What is the recommended action to do in order to switch between SSD and HDD storage for your Google Cloud Bigtable instance?
- Question #132
Your platform on your on-premises environment generates 100 GB of data daily, composed of millions of structured JSON text files. Your on-premises environment cannot be accessed fr...
- Question #133
You need to migrate a Redis database from an on-premises data center to a Memorystore for Redis instance. You want to follow Google recommended practices and perform the migration...
- Question #134
You are training a spam classifier. You notice that you are overfitting the training data. Which three actions can you take to resolve this problem? (Choose three.)
- Question #135Building and operationalizing data processing systems
You are implementing security best practices on your data pipeline. Currently, you are manually executing jobs as the Project Owner. You want to automate these jobs by taking night...
Service AccountsIAMLeast PrivilegeCloud Security - Question #136Building and operationalizing data processing systems
You are using Google BigQuery as your data warehouse. Your users report that the following simple query is running very slowly, no matter when they run the query: SELECT country, s...
BigQuery performanceData skewQuery optimizationDistributed computing - Question #137
Your globally distributed auction application allows users to bid on items. Occasionally, users place identical bids at nearly identical times, and different application servers pr...
- Question #138Building and operationalizing data processing systems
Your organization has been collecting and analyzing data in Google BigQuery for 6 months. The majority of the data analyzed is placed in a time-partitioned table named events_parti...
BigQuery ViewsSQL DialectsODBC ConnectivityService Accounts - Question #139
You have enabled the free integration between Firebase Analytics and Google BigQuery. Firebase now automatically creates a new table daily in BigQuery in the format app_events_YYYY...
- Question #140Building and operationalizing data processing systems
Your company is currently setting up data pipelines for their campaign. For all the Google Cloud Pub/Sub streaming data, one of the important business requirements is to be able to...
DataflowStreaming DataWindowingPipeline Failure - Question #141
You architect a system to analyze seismic data. Your extract, transform, and load (ETL) process runs as a series of MapReduce jobs on an Apache Hadoop cluster. The ETL process take...
- Question #142Designing data processing systems
An online retailer has built their current application on Google App Engine. A new initiative at the company mandates that they extend their application to allow their customers to...
Database selectionCloud SQLTransactional databasesBusiness intelligence - Question #143
You launched a new gaming app almost three years ago. You have been uploading log files from the previous day to a separate Google BigQuery table with the table name format LOGS_yy...
- Question #144
Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They w...
- Question #145
Your company receives both batch- and stream-based event data. You want to process the data using Google Cloud Dataflow over a predictable time period. However, you realize that in...
- Question #146Operationalizing machine learning models
You have some data, which is shown in the graphic below. The two dimensions are X and Y, and the shade of each dot represents what class it is. You want to classify this data accur...
Feature EngineeringMachine LearningClassificationLinear Models - Question #147
You are integrating one of your internal IT applications and Google BigQuery, so users can query BigQuery from the application's interface. You do not want individual users to auth...
- Question #148
You are building a data pipeline on Google Cloud. You need to prepare data using a casual method for a machine-learning process. You want to support a logistic regression model. Yo...
- Question #149
You set up a streaming data insert into a Redis cluster via a Kafka cluster. Both clusters are running on Compute Engine instances. You need to encrypt data at rest with encryption...
- Question #150
You are developing an application that uses a recommendation engine on Google Cloud. Your solution should display new videos to customers based on past views. Your solution needs t...
- Question #151
You are selecting services to write and transform JSON messages from Cloud Pub/Sub to BigQuery for a data pipeline on Google Cloud. You want to minimize service costs. You also wan...
- Question #152
Your infrastructure includes a set of YouTube channels. You have been tasked with creating a process for sending the YouTube channel data to Google Cloud for analysis. You want to...
- Question #153
You are designing storage for very large text files for a data pipeline on Google Cloud. You want to support ANSI SQL queries. You also want to support compression and parallel loa...
- Question #154
You are developing an application on Google Cloud that will automatically generate subject labels for users' blog posts. You are under competitive pressure to add this feature quic...
- Question #155Designing data processing systems
You are designing storage for 20 TB of text files as part of deploying a data pipeline on Google Cloud. Your input data is in CSV format. You want to minimize the cost of querying...
Cloud StorageBigQueryData WarehousingCost Optimization - Question #156
You are designing storage for two relational tables that are part of a 10-TB database on Google Cloud. You want to support transactions that scale horizontally. You also want to op...
- Question #157
Your financial services company is moving to cloud technology and wants to store 50 TB of financial time- series data in the cloud. This data is updated frequently and new data wil...
- Question #158
An organization maintains a Google BigQuery dataset that contains tables with user-level data. They want to expose aggregates of this data to other Google Cloud projects, while sti...
- Question #160
Your neural network model is taking days to train. You want to increase the training speed. What can you do?
- Question #161
You are responsible for writing your company's ETL pipelines to run on an Apache Hadoop cluster. The pipeline will require some checkpointing and splitting pipelines. Which method...
- Question #162Building and operationalizing data processing systems
Your company maintains a hybrid deployment with GCP, where analytics are performed on your anonymized customer data. The data are imported to Cloud Storage from your data center th...
Hybrid CloudData TransferNetwork PerformanceCloud Storage - Question #163Designing data processing systems
You work for a mid-sized enterprise that needs to move its operational system transaction data from an on-premises database to GCP. The database is about 20 TB in size. Which datab...
Database selectionCloud SQLRelational databasesData migration - Question #165
You want to archive data in Cloud Storage. Because some data is very sensitive, you want to use the "Trust No One" (TNO) approach to encrypt your data to prevent the cloud provider...
- Question #166
You have data pipelines running on BigQuery, Cloud Dataflow, and Cloud Dataproc. You need to perform health checks and monitor their behavior, and then notify the team managing the...