DEA-C01 Exam Questions
308 real DEA-C01 exam questions with expert-verified answers and explanations. Page 6 of 7.
- Question #251Data Operations and Support
A company uses AWS Glue Apache Spark jobs to handle extract, transform, and load (ETL) workloads. The company has enabled logging and monitoring for all AWS Glue jobs. One of the A...
AWS GlueApache SparkDebuggingSpark UI - Question #252Data Store Management
A data engineer notices slow query performance on a highly partitioned table that is in Amazon Athena. The table contains daily data for the previous 5 years, partitioned by date....
AthenaPartition ProjectionQuery Performance OptimizationPartition Management - Question #253Data Store Management
A data engineer is using an Apache Iceberg framework to build a data lake that contains 100 ТВ of data. The data engineer wants to run AWS Glue Apache Spark jobs that use the Icebe...
AWS GlueApache IcebergData LakeSpark - Question #254Data Ingestion and Transformation
A data engineer is configuring an AWS Glue Apache Spark extract, transform, and load (ETL) job. The job contains a sort-merge join of two large and equally sized DataFrames. The jo...
AWS GlueSpark ShuffleETL OptimizationSort-Merge Join - Question #255Data Ingestion and Transformation
A company needs to implement a workflow to process transactions. Each transaction goes through multiple levels of validation. Each validation level depends on the preceding validat...
AWS Step FunctionsWorkflow OrchestrationServerlessCost Optimization - Question #256Data Operations and Support
A data engineer configures a large number of AWS Glue jobs that all start up around the same time. All the jobs run for less than 1 hour in the same subnet of the same VPC. All the...
AWS Glue networkingVPC subnet IP addressesGlue job failuresCapacity planning - Question #257Data Operations and Support
A company runs an Apache Spark application every night in an Amazon EMR cluster. The company uses Amazon EC2 instances to supply compute capacity for the EMR cluster. The company d...
EMR loggingSpark troubleshootingApplication logsEMR steps - Question #258Data Ingestion and Transformation
A company processes 500 GB of audience and advertising data daily, storing CSV files in Amazon S3 with schemas registered in AWS Glue Data Catalog. They need to convert these files...
AWS Glue WorkflowsETLData OrchestrationServerless Data Processing - Question #259Data Security and Governance
A company uses an organization in AWS Organizations to manage multiple AWS accounts. The company uses an enhanced fanout data stream in Amazon Kinesis Data Streams to receive strea...
Cross-account accessIAM policiesKinesis Data StreamsResource-based policies - Question #260Data Store Management
A company needs to use Amazon Athena to analyze data that is in an Amazon S3 bucket. A data engineer needs to configure AWS Glue table partitions for year, month, and day. The data...
AWS AthenaAWS Glue Data CatalogPartition ProjectionData Partitioning - Question #261Data Ingestion and Transformation
A hotel management company receives daily data files from each of its hotels. The company wants to upload its data to AWS. The company plans to use Amazon Athena to access the file...
Data IngestionStream ProcessingObject StorageServerless - Question #262Data Store Management
A company wants to use Apache Spark jobs that run on an Amazon EMR cluster to process streaming data. The Spark jobs will transform and store the data in an Amazon S3 bucket. The c...
Data FormatsAWS Glue Data CatalogAnalytical Query PerformanceAmazon Athena - Question #263Data Store Management
A company adds new data to a large CSV file in an Amazon S3 bucket every day. The file contains company sales data from the previous 5 years. The file currently includes more than...
S3 PartitioningAmazon AthenaCost OptimizationData Querying - Question #264Data Security and Governance
A data engineer must implement a data cataloging solution to track schema changes in an Amazon Redshift table. Which solution will meet these requirements?
AWS GlueData CatalogingSchema ManagementRedshift Integration - Question #265Data Security and Governance
A data engineer is building a solution to detect sensitive information that is stored in a data lake across multiple Amazon S3 buckets. The solution must detect personally identifi...
PII DetectionData SecurityData GovernanceAWS Glue - Question #266Data Store Management
A data engineer is building a new data pipeline that stores metadata in an Amazon DynamoDB table. The data engineer must ensure that all items that are older than a specified age a...
DynamoDB TTLData RetentionAutomated CleanupNoSQL Databases - Question #267Data Store Management
A ride-sharing company stores records for all rides in an Amazon DynamoDB table. The table includes the following columns and types of values: The table currently contains billions...
DynamoDBGlobal Secondary Index (GSI)Data ModelingQuery Optimization - Question #268Data Ingestion and Transformation
A company stores information about its subscribers in an Amazon S3 bucket. The company runs an analysis every time a subscriber ends their subscription. The company uses AWS Lambda...
AWS LambdaPerformance TuningResource AllocationServerless Computing - Question #269Data Operations and Support
A company uses a data stream in Amazon Kinesis Data Streams to collect transactional data from multiple sources. The company uses an AWS Glue extract, transform, and load (ETL) pip...
AWS Glue ETLAuto ScalingPerformance TuningData Operations - Question #270Data Store Management
A company has a data processing pipeline that runs multiple SQL queries in sequence against an Amazon Redshift cluster. The company merges with a second company. The original compa...
RedshiftQuery OptimizationData DistributionPerformance Tuning - Question #271Data Store Management
A gaming company uses AWS Glue to perform read and write operations on Apache Iceberg tables for real-time streaming data. The data in the Iceberg tables is in Apache Parquet forma...
Apache IcebergAWS GlueQuery PerformanceData Compaction - Question #272Data Ingestion and Transformation
A data engineer at a company is optimizing extract, transform, and load (ETL) workflows. The current architecture uses Amazon EMR and Apache Spark for large-scale transformations a...
Serverless ETLData OrchestrationAWS GlueSpark Transformations - Question #273Data Security and Governance
A company that operates globally must follow regulations that require data from an AWS Region to be accessible only within that Region. A data engineer is creating a data pipeline...
IAM RolesIAM PoliciesIdentity FederationRegional Restrictions - Question #274Data Security and Governance
A company needs a solution to automatically ingest data from an Amazon S3 bucket to an Amazon Redshift table every day. The company must obfuscate sensitive data in Amazon Redshift...
AWS Glue ETLAmazon RedshiftData MaskingData Security - Question #275Data Ingestion and Transformation
A company needs to aggregate and filter a large amount of streaming data in real-time with low latency. The company needs to store the data in Amazon S3 for analysis. Which solutio...
Streaming DataReal-time ProcessingData AggregationOperational Efficiency - Question #276Data Ingestion and Transformation
A company is developing machine learning (ML) models. A data engineer needs to apply data quality rules to training data. The company stores the training data in an Amazon S3 bucke...
Data QualityAWS Glue DataBrewManaged ServicesOperational Overhead - Question #277Data Store Management
A data engineer needs to analyze time-sensitive sales data. The company stores the data in an Amazon S3 bucket. The data engineer uses AWS Glue Data Catalog to access the data. Whe...
AWS Glue Data CatalogS3 Data LakeData PartitioningData Freshness - Question #278Data Ingestion and Transformation
A retail company stores point-of-sale transaction data in an Amazon RDS for MySQL database. The company maintains historical sales analytics in Amazon Redshift. The company needs t...
Data IngestionData ReplicationAWS DMSRedshift Integration - Question #279Data Ingestion and Transformation
A company is creating a new data pipeline to populate a data lake. A data analyst needs to prepare and standardize the data before a data engineering team can perform advanced data...
Data PreparationETLAWS Glue StudioNo-Code - Question #280Data Ingestion and Transformation
A company uses an Amazon RDS for Oracle database to store data for an ecommerce application. The company needs a solution to make the data in all tables available for future analyt...
AWS DMSKinesis Data StreamsReal-time data ingestionData lake architecture - Question #281Data Store Management
A company maintains a central Amazon Redshift data warehouse that aggregates daily transactional data from Amazon RDS for PostgreSQL and Amazon Aurora MySQL. A data engineer notice...
RedshiftQuery OptimizationMaterialized ViewsData Warehousing - Question #282Data Ingestion and Transformation
A company uses an Amazon S3 bucket to integrate multiple data sources into a central data lake. The company needs to perform multiple transformations and data cleaning processes on...
AWS Glue Flex JobAmazon AthenaData Lake ETLCost Optimization - Question #283Data Ingestion and Transformation
A company needs to build an extract, transform, and load (ETL) pipeline that has separate stages for batch data ingestion, transformation, and storage. The pipeline must store the...
ETL PipelineWorkflow OrchestrationAWS Step FunctionsServerless Architecture - Question #284Data Security and Governance
A company needs to implement a data mesh architecture in which domains for trading, risk, and compliance teams each have own their data. The teams need to share specific views with...
AWS Lake FormationData MeshFine-grained Access ControlData Governance - Question #285Data Store Management
A company stores data in Amazon S3 and Amazon RDS. The company needs to apply technical controls to meet security requirements. The company must generate backups of the data and re...
AWS BackupData BackupData RetentionDeletion Protection - Question #286Data Ingestion and Transformation
A data engineer uploads confidential documents to an Amazon S3 bucket every day. The data engineer requires a solution to independently verify the integrity of all uploaded data to...
Data IntegrityChecksumsAmazon S3File Upload Verification - Question #287Data Security and Governance
A company needs to optimize storage for an Amazon S3 bucket. Objects older than 1 year must be accessible within 5 hours. All versions of the objects must be retained and immutable...
S3 Object LockS3 Intelligent-TieringData RetentionWORM storage - Question #288Data Operations and Support
A data engineer needs to deploy a complex pipeline. The stages of the pipeline must be able to run a script. The data engineer must use only fully managed and serverless services i...
Pipeline OrchestrationManaged ServicesServerlessWorkflow Management - Question #289Data Store Management
A company needs to optimize storage costs for an Amazon S3 bucket. The S3 bucket receives 10 million objects every day. The objects range in size from 2 KB to 5 MB. The objects nee...
S3 Lifecycle PolicyS3 Storage ClassesCost OptimizationData Retention - Question #290Data Ingestion and Transformation
A company needs to use an Aws Glue PySpark job to read specific data from an Amazon DynamoDB table. The company knows the partition key values for the required records. The existin...
AWS GlueDynamoDBData QueryingCost Optimization - Question #291Data Security and Governance
A company runs a multi-tenant Amazon EMR cluster on Amazon EC2 instances. Multiple teams perform interactive query analyses and data transformations on the data in the EMR cluster....
EMR SecurityRuntime RolesAccess ControlMulti-tenant EMR - Question #292Data Store Management
A company stores Apache Parquet files in an Amazon S3 data lake. The data lake receives thousands of files from multiple sources every hour. The files range in size from 50 KB to 1...
Apache IcebergData CompactionQuery OptimizationSmall File Problem - Question #293Data Operations and Support
A data engineer must maintain and monitor a data pipeline on AWS that processes streaming data from Internet of Things (IoT) devices. The pipeline uses Amazon Kinesis Data Streams...
AWS CloudWatchKinesisPipeline MonitoringOperational Excellence - Question #294Data Store Management
A company needs a solution to store and query product data that has variable attributes. The solution must support unpredictable and high-volume queries with single-digit milliseco...
DynamoDBNoSQLDatabase SelectionScalability - Question #295Data Ingestion and Transformation
A company is uploading log files from on-premises servers to an Amazon S3 bucket. The company needs to validate that the logs from the on-premises server are the same as the logs t...
S3 Data IntegrityChecksum ValidationAWS SDKObject Metadata - Question #296Data Operations and Support
An ecommerce company uses AWS Glue ETL to process and analyze orders. The company wants to build an extract, transform, and load (ETL) pipeline that processes placed, shipped, deli...
EventBridge SchedulerAWS Glue WorkflowsDead-letter QueuesETL Troubleshooting - Question #297Data Ingestion and Transformation
A company needs to build a data pipeline to process a 1-TB file from an Amazon S3 bucket. The pipeline needs to create three DataFrames based on business logic. The pipeline must s...
AWS GlueETLServerless ProcessingEvent-Driven Architecture - Question #298Data Ingestion and Transformation
A company needs to transform IoT sensor data in near real time before the company stores the data in an Amazon S3 bucket. The data is available from a data stream in Amazon Kinesis...
Stream ProcessingReal-time DataManaged ServicesStateful Transformations - Question #299Data Security and Governance
A company stores time-series data that is collected from streaming services in an Amazon S3 bucket. The company must ensure that only workloads that are deployed within the company...
S3 Bucket PoliciesVPC Access ControlData SecurityResource-Based Policies - Question #300Data Operations and Support
A company runs a data platform on AWS. The data platform uses AWS Glue to provide a data catalog and to perform processing. The company notices quality issues in the data. The comp...
AWS Glue Data QualityData ValidationAnomaly DetectionManaged Services