nerdexam
MicrosoftMicrosoft

DP-100 · Question #260

DP-100 Question #260: Real Exam Question with Answer & Explanation

To enable interactive data wrangling on Azure Data Lake Storage Gen2 with Apache Spark and user identity passthrough, one must first assign appropriate data access roles, then select the Serverless Spark Compute, and finally use the correct abfss:// URI format to access the data.

Explore data and run experiments

Question

Drag and Drop Question You manage an Azure Machine Learning workspace. You plan to import and wrangle data stored in Azure Data Lake Storage Gen2 with Apache Spark. You need to start interactive data wrangling with the user identity passthrough. Which three actions should you perform in sequence? To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order. Answer:

Explanation

To enable interactive data wrangling on Azure Data Lake Storage Gen2 with Apache Spark and user identity passthrough, one must first assign appropriate data access roles, then select the Serverless Spark Compute, and finally use the correct abfss:// URI format to access the data.

Approach. The correct interaction, as depicted in the second image, involves dragging the following three actions into the answer area in the specified sequence:

  1. Assign the user identity to Contributor and Storage Blob Data Contributor roles.

    • Reasoning: For user identity passthrough to Azure Data Lake Storage Gen2, the user (or the identity representing the user) must have appropriate data plane permissions on the storage account. The Storage Blob Data Contributor role grants read, write, and delete access to blob data (which ADLS Gen2 data is based on), and Contributor provides broader management permissions, though Storage Blob Data Contributor is key for data access. This is a crucial prerequisite for Spark to be able to access the data on behalf of the user.
  2. Select Serverless Spark Compute from the Compute selection menu.

    • Reasoning: The scenario specifies 'Apache Spark' for 'interactive data wrangling'. In Azure Machine Learning, Serverless Spark Compute (also known as Managed Spark or Azure Synapse Analytics Spark pool integrated with Azure ML) provides an on-demand, interactive Spark environment. This step provisions the necessary compute resource to run Spark jobs and perform data wrangling.
  3. Format the URI as abfss://<FILE_SYSTEM_NAME>@<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net/<PATH_TO_DATA>.

    • Reasoning: Azure Data Lake Storage Gen2 (ADLS Gen2) should be accessed using the abfss:// (Azure Blob File System Secure) URI scheme. This scheme is specifically designed for ADLS Gen2, supports the hierarchical namespace, and works optimally with Spark for secure access, including user identity passthrough. It leverages Azure Active Directory for authentication when configured correctly.

Common mistakes.

  • common_mistake. Two common mistakes or incorrect choices from the provided list are:
  • 'Format the URI as wasbs://<BLOB_CONTAINER_NAME>@<STORAGE_ACCOUNT_NAME>.blob.core.windows.net/<PATH_TO_DATA>.'

    • Why wrong: The wasbs:// URI scheme is used for standard Azure Blob Storage containers without the hierarchical namespace features of ADLS Gen2. While ADLS Gen2 is built on Azure Blob Storage, the abfss:// scheme is the correct and recommended way to interact with ADLS Gen2 to leverage its full capabilities, especially for user identity passthrough and integration with Spark/HDFS-compatible tools. Using wasbs:// would not correctly address the requirement for ADLS Gen2.
  • 'Set the property fs.azure.account.oauth2.client.id.<STORAGE_ACCOUNT_NAME>.dfs.core.windows.net.'

    • Why wrong: This property, along with others like fs.azure.account.oauth2.client.secret or fs.azure.account.oauth2.tenant.id, is typically used when authenticating to ADLS Gen2 using a service principal within a Spark application. The question specifically asks for 'user identity passthrough'. In this scenario, the authentication is handled by Azure Active Directory through the user's logged-in identity, and explicit client ID/secret properties are not directly set on the Spark session for the user's credentials. This would be a configuration for a different authentication mechanism.

Concept tested. Azure Data Lake Storage Gen2 (ADLS Gen2) access and authentication, specifically user identity passthrough with Apache Spark in Azure Machine Learning, including required Azure role-based access control (RBAC) permissions, selection of appropriate compute resources, and correct URI formatting.

Topics

#Azure ML Workspaces#Data Wrangling#Apache Spark#Identity Passthrough

Community Discussion

No community discussion yet for this question.

Full DP-100 PracticeBrowse All DP-100 Questions