From 6ada07d54a16f7d95ff0993b366328cd8520b1af Mon Sep 17 00:00:00 2001
From: mathiaspet <mathias.peters@inlinity.ai>
Date: Tue, 22 Nov 2022 08:34:13 +0100
Subject: [PATCH 1/3] feat: add documentation for direct access to Azure
 Storage accounts

---
 .../setting-up/on_azure_databricks.md         | 77 +++++++++++++++++++
 1 file changed, 77 insertions(+)
diff --git a/docs/using-anovos/setting-up/on_azure_databricks.md b/docs/using-anovos/setting-up/on_azure_databricks.md
index 6c9a7341..2332a2f8 100644
--- a/docs/using-anovos/setting-up/on_azure_databricks.md
+++ b/docs/using-anovos/setting-up/on_azure_databricks.md
@@ -500,3 +500,80 @@ in [Step 2.2](#step-22-copy-the-dataset-to-an-azure-blob-storage-container).
 
 The remaining steps are the same as above, so please continue with
 [Step 1.4](#step-14-configure-and-launch-an-anovos-workflow-as-a-databricks-job)
+
+## 3. Anovos on Azure Databricks Using an Azure Blob Storage Container using 
+
+### Step 3.1: Installing/Downloading Anovos
+
+This step is identical to
+[Step 1.1: Installing _Anovos_ on Azure Databricks](#step-11-installing-anovos-on-azure-databricks).
+
+### Step 3.2: Copy the dataset to an Azure Blob Storage container
+
+This step is identical to
+[Step 2.2: Copy the dataset to an Azure Blob Storage container](#step-22-copy-the-dataset-to-an-azure-blob-storage-container).
+
+### Step 2.3: Mount an Azure Blob Storage Container as a DBFS path in Azure Databricks
+
+To access files in an Azure Blob Storage container for running _Anovos_ in Azure Databricks platform,
+you need to mount that container in the DBFS path.
+
+```spark.conf.set("fs.azure.account.key.<storage-account-name>.dfs.core.windows.net", <storage-account-key>```
+
+TODO: CHECKOUT IF SAS-TOKEN DOES WORK TOO
+Here,
+- `<storage-account-name>` is the name of your Azure Blob Storage account
+- `<container-name>` is the name of a container in your Azure Blob Storage account
+- `<storage-account-key>` is the value of the storage account key (TODO: this is bad practise and should be solved with a secret)
+- `<sas_token>` is the SAS token for that storage account
+
+
+To learn more about accessing Azure Blob Storage containers using the abfss protocoll, please refer to
+[the Azure Blob Storage documentation](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage).
+
+💡 _Note that you only need to mount the container once._
+   _The container will remain mounted at the given mount point._
+   _To unmount a container, you can run `dbutils.fs.unmount("/mnt/<mount-name>")` in an Azure Databricks notebook._
+
+### Step 2.4: Update the workflow configuration for all input and output paths according to the DBFS mount point
+
+Once mounting is completed, the data is present in DBFS at the path specified as the mount point.
+All operations performed by _Anovos_ when running a workflow will result in changes in the data stored in the
+Azure Blob Storage container.
+
+The example configuration file we use in this tutorial can be found at `config/configs_income_azure_blob_mount.yaml`
+in the _Anovos_ repository.
+It will need to be updated to reflect the path of the mount point set above.
+
+In order for _Anovos_ to be able to find the input data and write the output to the correct location,
+update all paths to contain the path of the mount point:
+
+```yaml
+file_path: "dbfs:/mnt/<mount-name>/..."
+```
+
+🤓 _Example:_
+
+```yaml
+  read_dataset:
+    file_path: "dbfs:/mnt/anovos1/income_dataset/csv/"
+    file_type: csv
+```
+
+Here, the mount points is `dbfs:/mnt/anovos1` and the input dataset is stored in a folder called `income_dataset/csv`
+within the Azure Blob Storage container.
+
+To learn more about the _Anovos_ workflow configuration file and specifying paths for input and output data,
+have a look at the [Configuring Workloads](../config_file.md) page.
+
+### Step 2.5: Copy the updated configuration file from the local machine to the Azure Blob Storage container
+
+Once you have updated the configuration file, copy it to Azure Databricks using the same command that was used
+in [Step 2.2](#step-22-copy-the-dataset-to-an-azure-blob-storage-container).
+
+### Remaining Steps
+
+The remaining steps are the same as above, so please continue with
+[Step 1.4](#step-14-configure-and-launch-an-anovos-workflow-as-a-databricks-job)
+
+

From 17381f5ad70883b15a961505f1b5ec3e74cebcf9 Mon Sep 17 00:00:00 2001
From: mathiaspet <mathias.peters@inlinity.ai>
Date: Tue, 22 Nov 2022 08:43:21 +0100
Subject: [PATCH 2/3] feat: removed secret related text

---
 .../setting-up/on_azure_databricks.md         | 42 ++++++++++---------
 1 file changed, 23 insertions(+), 19 deletions(-)

diff --git a/docs/using-anovos/setting-up/on_azure_databricks.md b/docs/using-anovos/setting-up/on_azure_databricks.md
index 2332a2f8..49e93224 100644
--- a/docs/using-anovos/setting-up/on_azure_databricks.md
+++ b/docs/using-anovos/setting-up/on_azure_databricks.md
@@ -501,7 +501,7 @@ in [Step 2.2](#step-22-copy-the-dataset-to-an-azure-blob-storage-container).
 The remaining steps are the same as above, so please continue with
 [Step 1.4](#step-14-configure-and-launch-an-anovos-workflow-as-a-databricks-job)
 
-## 3. Anovos on Azure Databricks Using an Azure Blob Storage Container using 
+## 3. Anovos on Azure Databricks using direct access to Azure Blob Storage Container 
 
 ### Step 3.1: Installing/Downloading Anovos
 
@@ -513,33 +513,36 @@ This step is identical to
 This step is identical to
 [Step 2.2: Copy the dataset to an Azure Blob Storage container](#step-22-copy-the-dataset-to-an-azure-blob-storage-container).
 
-### Step 2.3: Mount an Azure Blob Storage Container as a DBFS path in Azure Databricks
+### Step 3.3: Add the secret to the spark configuration
 
 To access files in an Azure Blob Storage container for running _Anovos_ in Azure Databricks platform,
-you need to mount that container in the DBFS path.
+you need to either add the storage account key or an SAS token to the spark cluster config.
+The following command adds the storage account key to the spark config: 
 
 ```spark.conf.set("fs.azure.account.key.<storage-account-name>.dfs.core.windows.net", <storage-account-key>```
 
-TODO: CHECKOUT IF SAS-TOKEN DOES WORK TOO
 Here,
 - `<storage-account-name>` is the name of your Azure Blob Storage account
-- `<container-name>` is the name of a container in your Azure Blob Storage account
 - `<storage-account-key>` is the value of the storage account key (TODO: this is bad practise and should be solved with a secret)
-- `<sas_token>` is the SAS token for that storage account
 
+You can access the contents of a storage account using an SAS token as well. The following commands add the generated SAS token to the spark cluster config: 
+```spark.conf.set("fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net", "SAS")```
+```spark.conf.set("fs.azure.sas.token.provider.type.<storage-account-name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")```
+```spark.conf.set("fs.azure.sas.fixed.token.<storage-account-name>.dfs.core.windows.net", "<sas-token>")```
 
 To learn more about accessing Azure Blob Storage containers using the abfss protocoll, please refer to
 [the Azure Blob Storage documentation](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage).
 
-💡 _Note that you only need to mount the container once._
-   _The container will remain mounted at the given mount point._
-   _To unmount a container, you can run `dbutils.fs.unmount("/mnt/<mount-name>")` in an Azure Databricks notebook._
 
-### Step 2.4: Update the workflow configuration for all input and output paths according to the DBFS mount point
+### Step 3.4: Update the workflow configuration for all input and output paths according to the DBFS mount point
 
-Once mounting is completed, the data is present in DBFS at the path specified as the mount point.
-All operations performed by _Anovos_ when running a workflow will result in changes in the data stored in the
-Azure Blob Storage container.
+The input and output paths need to be prefixed with the following value: 
+
+```abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/```
+
+Here,
+- `<storage-account-name>` is the name of your Azure Blob Storage account
+- `<storage-account-key>` is the value of the storage account key (TODO: this is bad practise and should be solved with a secret)
 
 The example configuration file we use in this tutorial can be found at `config/configs_income_azure_blob_mount.yaml`
 in the _Anovos_ repository.
@@ -549,27 +552,28 @@ In order for _Anovos_ to be able to find the input data and write the output to
 update all paths to contain the path of the mount point:
 
 ```yaml
-file_path: "dbfs:/mnt/<mount-name>/..."
+file_path: "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/..."
 ```
 
 🤓 _Example:_
 
 ```yaml
   read_dataset:
-    file_path: "dbfs:/mnt/anovos1/income_dataset/csv/"
+    file_path: "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/income_dataset/csv/"
     file_type: csv
 ```
 
-Here, the mount points is `dbfs:/mnt/anovos1` and the input dataset is stored in a folder called `income_dataset/csv`
-within the Azure Blob Storage container.
+Here, the URL points to the storage container and account `abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/` and the input dataset is stored in a folder called `income_dataset/csv` within the Azure Blob Storage container.
 
 To learn more about the _Anovos_ workflow configuration file and specifying paths for input and output data,
 have a look at the [Configuring Workloads](../config_file.md) page.
 
-### Step 2.5: Copy the updated configuration file from the local machine to the Azure Blob Storage container
+### Step 3.5: Copy the updated configuration file to Databricks DBFS
 
 Once you have updated the configuration file, copy it to Azure Databricks using the same command that was used
-in [Step 2.2](#step-22-copy-the-dataset-to-an-azure-blob-storage-container).
+in [Step 1.2](#step-12-prepare-and-copy-the-workflow-configuration-and-data-to-dbfs).
+
+You can configure the file_path now to that location.
 
 ### Remaining Steps
 

From 4f53977b860b6adfca158114a7424303d408c44b Mon Sep 17 00:00:00 2001
From: Kilian Kluge <32523967+ionicsolutions@users.noreply.github.com>
Date: Tue, 22 Nov 2022 14:09:14 +0100
Subject: [PATCH 3/3] Update on_azure_databricks.md

---
 .../setting-up/on_azure_databricks.md         | 43 +++++++++++--------
 1 file changed, 25 insertions(+), 18 deletions(-)

diff --git a/docs/using-anovos/setting-up/on_azure_databricks.md b/docs/using-anovos/setting-up/on_azure_databricks.md
index 49e93224..e5886884 100644
--- a/docs/using-anovos/setting-up/on_azure_databricks.md
+++ b/docs/using-anovos/setting-up/on_azure_databricks.md
@@ -501,7 +501,7 @@ in [Step 2.2](#step-22-copy-the-dataset-to-an-azure-blob-storage-container).
 The remaining steps are the same as above, so please continue with
 [Step 1.4](#step-14-configure-and-launch-an-anovos-workflow-as-a-databricks-job)
 
-## 3. Anovos on Azure Databricks using direct access to Azure Blob Storage Container 
+## 3. Anovos on Azure Databricks using direct access to Azure Blob Storage containers 
 
 ### Step 3.1: Installing/Downloading Anovos
 
@@ -513,28 +513,33 @@ This step is identical to
 This step is identical to
 [Step 2.2: Copy the dataset to an Azure Blob Storage container](#step-22-copy-the-dataset-to-an-azure-blob-storage-container).
 
-### Step 3.3: Add the secret to the spark configuration
+### Step 3.3: Add the secret to the Spark configuration
 
-To access files in an Azure Blob Storage container for running _Anovos_ in Azure Databricks platform,
-you need to either add the storage account key or an SAS token to the spark cluster config.
-The following command adds the storage account key to the spark config: 
+To access files in an Azure Blob Storage container for running _Anovos_ on the Azure Databricks platform,
+you need to either add the Storage account key or an SAS token to the Spark cluster configuration.
+
+The following command adds the Storage account key to the Spark cluster configuration: 
 
 ```spark.conf.set("fs.azure.account.key.<storage-account-name>.dfs.core.windows.net", <storage-account-key>```
 
 Here,
 - `<storage-account-name>` is the name of your Azure Blob Storage account
-- `<storage-account-key>` is the value of the storage account key (TODO: this is bad practise and should be solved with a secret)
+- `<storage-account-key>` is the value of the Storage account key (TODO: this is bad practice and should be solved with a secret)
 
-You can access the contents of a storage account using an SAS token as well. The following commands add the generated SAS token to the spark cluster config: 
-```spark.conf.set("fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net", "SAS")```
-```spark.conf.set("fs.azure.sas.token.provider.type.<storage-account-name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")```
-```spark.conf.set("fs.azure.sas.fixed.token.<storage-account-name>.dfs.core.windows.net", "<sas-token>")```
+You can access the contents of a storage account using an SAS token as well.
 
-To learn more about accessing Azure Blob Storage containers using the abfss protocoll, please refer to
-[the Azure Blob Storage documentation](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage).
+The following commands add the generated SAS token to the Spark cluster configuration:
 
+```
+spark.conf.set("fs.azure.account.auth.type.<storage-account-name>.dfs.core.windows.net", "SAS")
+spark.conf.set("fs.azure.sas.token.provider.type.<storage-account-name>.dfs.core.windows.net", "org.apache.hadoop.fs.azurebfs.sas.FixedSASTokenProvider")```
+spark.conf.set("fs.azure.sas.fixed.token.<storage-account-name>.dfs.core.windows.net", "<sas-token>")
+```
 
-### Step 3.4: Update the workflow configuration for all input and output paths according to the DBFS mount point
+To learn more about accessing Azure Blob Storage containers using the `abfss` protocol, please refer to
+[the Azure Blob Storage documentation](https://docs.microsoft.com/en-us/azure/databricks/data/data-sources/azure/azure-storage).
+
+### Step 3.4: Update the input and output paths in the _Anovos_ workflow configuration
 
 The input and output paths need to be prefixed with the following value: 
 
@@ -542,11 +547,11 @@ The input and output paths need to be prefixed with the following value:
 
 Here,
 - `<storage-account-name>` is the name of your Azure Blob Storage account
-- `<storage-account-key>` is the value of the storage account key (TODO: this is bad practise and should be solved with a secret)
+- `<storage-account-key>` is the value of the Storage account key (TODO: this is bad practice and should be solved with a secret)
 
 The example configuration file we use in this tutorial can be found at `config/configs_income_azure_blob_mount.yaml`
-in the _Anovos_ repository.
-It will need to be updated to reflect the path of the mount point set above.
+in the [_Anovos_ GitHub repository](https://github.com/anovos/anovos).
+It will need to be updated to reflect the path of the Azure Blob Storage container's mount point set above.
 
 In order for _Anovos_ to be able to find the input data and write the output to the correct location,
 update all paths to contain the path of the mount point:
@@ -563,7 +568,9 @@ file_path: "abfss://<container-name>@<storage-account-name>.dfs.core.windows.net
     file_type: csv
 ```
 
-Here, the URL points to the storage container and account `abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/` and the input dataset is stored in a folder called `income_dataset/csv` within the Azure Blob Storage container.
+Here, the URL points to the Storage container and account
+`abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/`
+and the input dataset is stored in a folder called `income_dataset/csv` within the Azure Blob Storage container.
 
 To learn more about the _Anovos_ workflow configuration file and specifying paths for input and output data,
 have a look at the [Configuring Workloads](../config_file.md) page.
@@ -573,7 +580,7 @@ have a look at the [Configuring Workloads](../config_file.md) page.
 Once you have updated the configuration file, copy it to Azure Databricks using the same command that was used
 in [Step 1.2](#step-12-prepare-and-copy-the-workflow-configuration-and-data-to-dbfs).
 
-You can configure the file_path now to that location.
+You can now configure the `file_path` to point to that location.
 
 ### Remaining Steps