nerdexam
MicrosoftMicrosoft

DP-300 · Question #91

DP-300 Question #91: Real Exam Question with Answer & Explanation

Azure Synapse Analytics: Table Distribution Types Dropdown 1: Retail Store Table → Replicated Why Replicated is correct: The retail store table is approximately 2 MB — an extremely small dimension table. A Replicated table caches a full copy on every Compute node (up to 60).

Submitted by certguy· Mar 6, 2026Optimize query performance

Question

Case Study 3 - Contoso, Ltd 2 Overview Contoso, Ltd. is a clothing retailer based in Seattle. The company has 2,000 retail stores across the United States and an emerging online presence. The network contains an Active Directory forest named contoso.com. The forest is integrated with an Azure Active Directory (Azure AD) tenant named contoso.com. Contoso has an Azure subscription associated to the contoso.com Azure AD tenant. Existing Environment Transactional Data Contoso has three years of customer, transaction, operational, sourcing, and supplier data comprised of 10 billion records stored across multiple on-premises Microsoft SQL Server servers. The SQL Server instances contain data from various operations systems. The data is loaded into the instances by using SQL Server Integration Services (SSIS) packages. You estimate that combining all product sales transactions into a company-wide sales transactions dataset will result in a single table that contains 5 billion rows, with one row per transaction. Most queries targeting the sales transactions data will be used to identify which products were sold in retail stores and which products were sold online during different time periods. Sales transaction data that is older than three years will be removed monthly. You plan to create a retail store table that will contain the address of each retail store. The table will be approximately 2 MB. Queries for retail store sales will include the retail store addresses. You plan to create a promotional table that will contain a promotion ID. The promotion ID will be associated to a specific product. The product will be identified by a product ID. The table will be approximately 5 GB. Streaming Twitter Data The ecommerce department at Contoso develops an Azure logic app that captures trending Twitter feeds referencing the company's products and pushes the products to Azure Event Hubs. Planned Changes and Requirements Planned Changes Contoso plans to implement the following changes: Load the sales transaction dataset to Azure Synapse Analytics. Integrate on-premises data stores with Azure Synapse Analytics by using SSIS packages. Use Azure Synapse Analytics to analyze Twitter feeds to assess customer sentiments about products. Sales Transaction Dataset Requirements Contoso identifies the following requirements for the sales transaction dataset: Partition data that contains sales transaction records. Partitions must be designed to provide efficient loads by month. Boundary values must belong to the partition on the right. Ensure that queries joining and filtering sales transaction records based on product ID complete as quickly as possible. Implement a surrogate key to account for changes to the retail store addresses. Ensure that data storage costs and performance are predictable. Minimize how long it takes to remove old records. Customer Sentiment Analytics Requirements Contoso identifies the following requirements for customer sentiment analytics: Allow Contoso users to use PolyBase in an Azure Synapse Analytics dedicated SQL pool to query the content of the data records that host the Twitter feeds. Data must be protected by using row-level security (RLS). The users must be authenticated by using their own Azure AD credentials. Maximize the throughput of ingesting Twitter feeds from Event Hubs to Azure Storage without purchasing additional throughput or capacity units. Store Twitter feeds in Azure Storage by using Event Hubs Capture. The feeds will be converted into Parquet files. Ensure that the data store supports Azure AD-based access control down to the object level. Minimize administrative effort to maintain the Twitter feed data records. Purge Twitter feed data records that are older than two years. Data Integration Requirements Contoso identifies the following requirements for data integration: Use an Azure service that leverages the existing SSIS packages to ingest on-premises data into datasets stored in a dedicated SQL pool of Azure Synapse Analytics and transform the data. Identify a process to ensure that changes to the ingestion and transformation activities can be version-controlled and developed independently by multiple data engineers. Hotspot Question You need to design an analytical storage solution for the transactional data. The solution must meet the sales transaction dataset requirements. What should you include in the solution? To answer, select the appropriate options in the answer area. NOTE: Each correct selection is worth one point. Answer:

Options

  • __typehotspot
  • variantdropdown

Explanation

Azure Synapse Analytics: Table Distribution Types

Dropdown 1: Retail Store Table → Replicated

Why Replicated is correct: The retail store table is approximately 2 MB — an extremely small dimension table. A Replicated table caches a full copy on every Compute node (up to 60). Since queries joining retail store addresses with sales transactions are frequent, replication eliminates all data movement during joins. No shuffling across nodes = maximum join performance for a tiny table.

Why the others are wrong:

  • Hash: Designed for large tables (fact tables, typically hundreds of GB+). On a 2 MB table, hash distribution adds unnecessary complexity and still requires data movement if the joining table isn't hash-distributed on the same key.
  • Round-robin: Spreads rows randomly across distributions. Fast for bulk loads/staging but requires a shuffle operation (data movement) on every join, defeating the purpose for a frequently-joined dimension table.

Dropdown 2: Promotional Table → Hash

Why Hash is correct: The promotional table is 5 GB — too large for Replicated (Microsoft's recommended upper limit is ~2 GB; larger replicated tables waste memory and slow maintenance). The table links promotion_id → product_id, and the requirement explicitly states queries filtering/joining on product ID must complete as fast as possible. Hash-distributing on product_id co-locates promotional rows with matching sales transaction rows (assuming the sales table is also hash-distributed on product_id), eliminating the expensive broadcast/shuffle step during joins.

Why the others are wrong:

  • Replicated: 5 GB exceeds the practical threshold. Each Compute node would cache 5 GB, wasting memory and causing slow table refresh after DML operations.
  • Round-robin: Distributes rows with no awareness of join keys. Every join on product_id would require a full data shuffle across all 60 distributions — terrible for query performance on a table explicitly requiring fast product ID joins.

Key Technical Concept

Azure Synapse dedicated SQL pools distribute data across 60 distributions. The rule of thumb:

SizeDistribution
< ~2 GBReplicated (full copy on each node)
Large, frequently joined on a keyHash (on the join key)
Staging / no join patternRound-robin

Topics

#Azure Synapse Analytics#Table Distribution#Data Warehousing#Query Performance

Community Discussion

No community discussion yet for this question.

Full DP-300 PracticeBrowse All DP-300 Questions