Monitoring Azure Databricks in an Azure Log Analytics Workspace

This repository extends the core monitoring functionality of Azure Databricks to send streaming query event information to Azure Log Analytics. It has the following directory structure:

/src
  /spark-jobs
  /spark-listeners-loganalytics
  /spark-listeners
  /pom.xml

The spark-jobs directory is a sample Spark application with sample code demonstrating how to implement a Spark application metric counter.

The spark-listeners-loganalytics and spark-listeners directories contain the code for building the two JAR files that are deployed to the Databricks cluster. The spark-listeners directory includes a scripts directory that contains a cluster node initialization script to copy the JAR files from a staging directory in the Azure Databricks file system to execution nodes.

The pom.xml file is the main Maven project object model build file for the entire project.

Build the Azure Databricks monitoring library and configure an Azure Databricks cluster

Before you begin, ensure you have the following prerequisites in place:

Clone, fork, or download the GitHub repository.
An active Azure Databricks workspace. For instructions on how to deploy an Azure Databricks workspace, see get started with Azure Databricks..
Install the Azure Databricks CLI.
- An Azure Databricks personal access token is required to use the CLI. For instructions, see token management.
- You can also use the Azure Databricks CLI from the Azure Cloud Shell.
A Java IDE, with the following resources:

Build the Azure Databricks monitoring library

To build the Azure Databricks monitoring library, follow these steps:

Import the Maven project project object model file, pom.xml, located in the /src folder into your project. This will import three projects:

spark-jobs
spark-listeners
spark-listeners-loganalytics

Execute the Maven package build phase in your Java IDE to build the JAR files for each of the these three projects:

Project	JAR file
spark-jobs	spark-jobs-1.0-SNAPSHOT.jar
spark-listeners	spark-listeners-1.0-SNAPSHOT.jar
spark-listeners-loganalytics	spark-listeners-loganalytics-1.0-SNAPSHOT.jar

Use the Azure Databricks CLI to create a directory named dbfs:/databricks/monitoring-staging:

dbfs mkdirs dbfs:/databricks/monitoring-staging

Use the Azure Databricks CLI to copy /src/spark-listeners/scripts/listeners.sh to the directory created in step 3:

dbfs cp <local path to listeners.sh> dbfs:/databricks/monitoring-staging/listeners.sh

Use the Azure Databricks CLI to copy /src/spark-listeners/scripts/metrics.properties to the directory created in step 3:

dbfs cp <local path to metrics.properties> dbfs:/databricks/monitoring-staging/metrics.properties

Use the Azure Databricks CLI to copy spark-listeners-1.0-SNAPSHOT.jar and spark-listeners-loganalytics-1.0-SNAPSHOT.jar that were built in step 2 to the directory created in step 3:

dbfs cp <local path to spark-listeners-1.0-SNAPSHOT.jar> dbfs:/databricks/monitoring-staging/spark-listeners-1.0-SNAPSHOT.jar
dbfs cp <local path to spark-listeners-loganalytics-1.0-SNAPSHOT.jar> dbfs:/databricks/monitoring-staging/spark-listeners-loganalytics-1.0-SNAPSHOT.jar

Create and configure the Azure Databricks cluster

To create and configure the Azure Databricks cluster, follow these steps:

Navigate to your Azure Databricks workspace in the Azure Portal.
On the home page, click "new cluster".
Choose a name for your cluster and enter it in "cluster name" text box.
In the "Databricks Runtime Version" dropdown, select 4.3 (includes Apache Spark 2.3.1, Scala 2.11).
Under "Advanced Options", click on the "Spark" tab. Enter the following name-value pairs in the "Spark Config" text box:

Name	Value
spark.extraListeners	com.databricks.backend.daemon.driver.DBCEventLoggingListener,org.apache.spark.listeners.UnifiedSparkListener
spark.unifiedListener.sink	org.apache.spark.listeners.sink.loganalytics.LogAnalyticsListenerSink
spark.unifiedListener.logBlockUpdates	false

While still under the "Spark" tab, enter the following in the "Environment Variables" text box:

LOG_ANALYTICS_WORKSPACE_ID=your Azure Log Analytics workspace ID
LOG_ANALYTICS_WORKSPACE_KEY=your Azure Log Analytics shared access signature

While still under the "Advanced Options" section, click on the "Init Scripts" tab. Go to the last line under the "Init Scripts section" Under the "destination" dropdown, select "DBFS". Enter "dbfs:/databricks/monitoring-staging/listeners.sh" in the text box. Click the "add" button.
Click the "create cluster" button to create the cluster. Next, click on the "start" button to start the cluster.

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
perftools		perftools
src		src
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Monitoring Azure Databricks in an Azure Log Analytics Workspace

Build the Azure Databricks monitoring library and configure an Azure Databricks cluster

Build the Azure Databricks monitoring library

Create and configure the Azure Databricks cluster

About

Uh oh!

Releases

Packages

Contributors 5

Uh oh!

Languages

License

algattik/spark-monitoring

Folders and files

Latest commit

History

Repository files navigation

Monitoring Azure Databricks in an Azure Log Analytics Workspace

Build the Azure Databricks monitoring library and configure an Azure Databricks cluster

Build the Azure Databricks monitoring library

Create and configure the Azure Databricks cluster

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Uh oh!

Languages

Packages