Summer Sale Limited Time Flat 70% Discount offer - Ends in 0d 00h 00m 00s - Coupon code: 70spcl

Microsoft DP-750 Implementing Data Engineering Solutions Using Azure Databricks Exam Practice Test

Page: 1 / 6
Total 58 questions

Implementing Data Engineering Solutions Using Azure Databricks Questions and Answers

Question 1

You have an Azure Databricks workspace that is enabled for Unity Catalog and contains a Delta table named Orders.

You load the Orders table into an Apache Spark DataFrame named df.

You need to create a DataFrame that excludes rows where the order amount is null.

Solution: You run the following expression.

df.filter(df.order_amount.isNotNull())

Does this meet the goal?

Options:

A.

Yes

B.

No

Question 2

You have an Azure Databricks workspace that is enabled for Unity Catalog and contains two managed Delta tables named sales.schema1.table1 and sales.schema1.table2.

sales.schema1.table1 contains sales data from the current year.

sales.schema1.table2 contains historical data.

You need to load all the rows from sales.schema1.table1 into sales.schema1.table2. The solution must preserve any existing data in sales.schema1.table2 and minimize processing effort.

Which command should you run?

Options:

A.

INSERT INTO sales.schema1.table2 SELECT * FROM sales.schema1.table1;

B.

CREATE TABLE sales.schema1.table2 AS SELECT * FROM sales.schema1.table1;

C.

INSERT OVERWRITE sales.schema1.table2 SELECT * FROM sales.schema1.table1;

D.

CREATE OR REPLACE TABLE sales.schema1.table2 AS SELECT * FROM sales.schema1.table1;

Question 3

You have an Azure Databricks workspace that is enabled for Unity Catalog and contains a managed Delta table named Tabid.

Table! is written by batch jobs every hour and is queried frequently by filtering two columns named Customerld and EventDate.

You expect Table1 to grow significantly over time.

The rows in Table1 are frequently updated and deleted to support compliance requests.

You need to keep query performance consistent as Table1 grows. The solution must minimize update and deletion effort.

What should you include in the solution? To answer, select the appropriate options in the answer area

NOTE: Each correct selection is worth one point.

Question # 3

Options:

Question 4

You have an Azure Databricks workspace that is enabled for Unity Catalog.

You need to recommend a pipeline that ingests files from cloud storage, performs cleansing and enrichment transformations, and writes created Delta tables for analytics. The solution must minimize development effort and provide built-in monitoring and automatic retries.

What should you include in the recommendation?

Options:

A.

an Apache Spark Structured Streaming job

B.

a Databricks notebook triggered by a scheduled job

C.

a Lakeflow Spark Declarative Pipelines (SDPJ pipeline

D.

an Azure Data Factory pipeline that uses data flows

Question 5

You have an Azure Databricks workspace that contains a job in Lakeflow Jobs named Job1.

Job! contains three tasks named Task1, Task2. and Task3.

If Task1 fails, Task2 and Task3 must be prevented from running. Successfully completed tasks must NOT rerun during recovery.

You need to configure Job1 to support controlled failure handling and recovery

What should you configure? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 5

Options:

Question 6

You have an Azure Databricks workspace named Workspace1 that is attached to a Unity Catalog metastore named metastore1

You need to register an Azure Storage account named account1 that has a hierarchical namespace enabled as an external location The external location must use a managed identity to authenticate to account1 and the solution must follow the principle of least privilege.

Which three actions should you perform in sequence' To answer, move the appropriate actions from the list of actions to the answer area and arrange them in the correct order.

Question # 6

Options:

Question 7

You have an Azure Databricks workspace that is enabled for Unity Catalog and contains a Delta table named Sales_orders. Sales.orders stores historical sales data.

You receive a daily CSV file daily that contains new sales records only. The file does NOT contain updates to existing rows You need to load the daily data into Sales.orders. The solution must meet the following requirements:

• Preserve the existing data.

• Add only the new records.

• Minimize processing effort.

Which command should include in the loading strategy?

Options:

A.

INSERT OVERWRITE

B.

UPDATE

C.

INSERT INTO

Question 8

You have an Azure Databricks workspace named Workspace1 that contains a lakehouse and is enabled for Unity Catalog.

You have a connection to a Microsoft SQL Server database named DB1.

You need to expose the schemas and tables of DB1 to meet the following requirements:

• The schemas and tables can be queried in Databricks.

• The schemas and tables appear alongside other Unity Catalog objects.

• The data is NOT copied into Databricks-managed storage.

Solution: You create a Lakeflow Connect pipeline and connect it to DB1. Does this meet the goal?

Options:

A.

Yes

B.

No

Question 9

You need to develop the task logic for a new job in Lakeflow Jobs that processes telemetry data.

Each task must contain only the appropriate logic for its step in the pipeline. The solution must support the planned changes and meet the data ingestion and processing requirements.

What should you do?

Options:

A.

Use a single Databricks notebook task that performs ingestion, cleansing, and curation in one script.

B.

Create three tasks that each contains the identical logic and use task retries.

C.

Use a single SQL task that performs ingestion, cleansing, and curation by running merge commands.

D.

Create separate tasks for ingestion, cleansing, and curation.

Question 10

You need to configure compute for the ingestion of telemetry data. The solution must meet the data ingestion and processing requirements.

What should you do?

Options:

A.

Enable Photon acceleration for a job compute cluster.

B.

Move the ingestion pipelines to shared compute.

C.

Increase an all-purpose cluster to a larger fixed node type.

D.

Disable autoscaling for a job compute cluster.

Question 11

Which SCD type should you use to support the planned data modeling changes? To answer, drag the appropriate types to the correct issues. Each type may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

Question # 11

Options:

Question 12

You need to complete the PySpark code for the Spark Structured Streaming pipelines. The solution must meet the data ingestion and processing requirements.

How should you complete the code segment? To answer, select the appropriate options in the answer area.

NOTE: Each correct selection is worth one point.

Question # 12

Options:

Question 13

Which ingestion option should you recommend for each data source? To answer, drag the appropriate options to the correct data sources. Each option may be used once, more than once, or not at all. You may need to drag the split bar between panes or scroll to view content.

NOTE: Each correct selection is worth one point.

Question # 13

Options:

Page: 1 / 6
Total 58 questions