Start the SQL Pool in your lab environment.
-
Open the Synapse Studio workspace and navigate to the Manage hub.
-
From the center menu, select SQL pools from beneath the Analytics pools heading. Locate
SQLPool01
, and select the Resume button.
In this exercise you will learn how to work with Spark DataFrames in Synapse Spark, including:
- Working with schemas and lake databases
- Performing dataframe operations
- Working with dataframe partitions
-
Open Synapse Analytics Studio, and then navigate to the
Develop
hub. -
Under Notebooks, select the notebook called
Lab 07 - Part 1 - Spark DataFrames
. Please connect toSparkPool01
for this notebook. -
Read through the notebook and execute the cells as instructed in the notebook. When you have finished in the notebook, you have completed this lab.
IMPORTANT!
Once you complete the steps in the notebook, make sure you stop the Spark session when closing the notebook. This will free up the necessary compute resources to start the Spark sessions for the other exercises in this lab.
In this exercise you will learn how to work with Delta Lake and mssparkutils
in Synapse Spark.
-
Open Synapse Analytics Studio, and then navigate to the
Develop
hub. -
Under Notebooks, select the notebook called
Lab 07 - Part 2 - Spark Delta Lake
. -
Read through the notebook and execute the cells as instructed in the notebook. When you have finished in the notebook, you have completed this lab.
IMPORTANT!
Once you complete the steps in the notebook, make sure you stop the Spark session when closing the notebook. This will free up the necessary compute resources to start the Spark sessions for the other exercises in this lab.
In this exercise you will learn how to work with Hyperspace in Synapse Spark.
-
Open Synapse Analytics Studio, and then navigate to the
Develop
hub. -
Under Notebooks, select the notebook called
Lab 07 - Part 3 - Spark Hyperspace
. -
Read through the notebook and execute the cells as instructed in the notebook. When you have finished in the notebook, you have completed this lab.
IMPORTANT!
Once you complete the steps in the notebook, make sure you stop the Spark session when closing the notebook. This will free up the necessary compute resources to start the Spark sessions for the other exercises in this lab.
In this exercise you will learn how to create and run a Spark job in Synapse Spark. The job will perform the task of counting the words in a text file stored in the Synapse workspace data lake storage.
-
Open Synapse Analytics Studio, and then navigate to the
Develop
hub. -
Select
+
and then selectApache Spark job definition
to initiate the creation of a new Spark job. -
In the Spark job definition form, fill in the following properties:
- Language:
PySpark (Python)
- Main definition file:
abfss://wwi-02@<your_data_lake_account_name>.dfs.core.windows.net/spark-job/wordcount.py
(where<your_data_lake_account_name>
is the name of the Synapse workspace data lake account configured in your environment) - Command line arguments:
abfss://wwi-02@<your_data_lake_account_name>.dfs.core.windows.net/spark-job/shakespeare.txt abfss://wwi-02@<your_data_lake_account_name>.dfs.core.windows.net/spark-job/result
(where<your_data_lake_account_name>
is the name of the Synapse workspace data lake account configured in your environment) - Apake Spark pool:
SparkPool02
Once all the properties mentioned above are filled in, select
Publish
to publish the new Spark job. - Language:
-
When the publishing is finished, select
Submit
to start the new Spark job. -
Navigate to the
Monitor
hub and select theApache Spark applications
section. Identify the Spark application corresponding to your Spark job. -
Select the Spark application corresponding to your job and wait until it finishes (you might need to select
Refresh
every minute or so to update the status). -
Once the Spark job finishes successfully, check the
/spark-job/result
folder located in thewwi-02
container on the Synapse workspace data lake storage account. The files in the folder are text files containing the word counting results.