Skip to content

Conversation

@mathiaspet
Copy link
Collaborator

No description provided.

To access files in an Azure Blob Storage container for running _Anovos_ in Azure Databricks platform,
you need to mount that container in the DBFS path.

```spark.conf.set("fs.azure.account.key.<storage-account-name>.dfs.core.windows.net", <storage-account-key>```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do I use/add this line?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

@ionicsolutions ionicsolutions left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also augment/update the intro of the page:

1. Processing datasets stored directly on DBFS
2. Processing datasets stored on [Azure Blob Storage](https://azure.microsoft.com/services/storage/blobs/)
Generally, we recommend the first option, as it requires slightly less configuration.
However, if you're already storing your datasets on Azure Blob Storage, mounting the respective containers
to DBFS allows you to directly process them with _Anovos_.

To access files in an Azure Blob Storage container for running _Anovos_ in Azure Databricks platform,
you need to mount that container in the DBFS path.

```spark.conf.set("fs.azure.account.key.<storage-account-name>.dfs.core.windows.net", <storage-account-key>```
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Here,
- `<storage-account-name>` is the name of your Azure Blob Storage account
- `<storage-account-key>` is the value of the Storage account key (TODO: this is bad practice and should be solved with a secret)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mathiaspet Please provide information on how to do this properly. Don't mention a bad practice in the docs :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I would assume that this is fine in certain situations and not others. Would be great to tell people when/why this is (not) recommended.

Once you have updated the configuration file, copy it to Azure Databricks using the same command that was used
in [Step 1.2](#step-12-prepare-and-copy-the-workflow-configuration-and-data-to-dbfs).

You can now configure the `file_path` to point to that location.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where`/how?

The remaining steps are the same as above, so please continue with
[Step 1.4](#step-14-configure-and-launch-an-anovos-workflow-as-a-databricks-job)

## 3. Anovos on Azure Databricks using direct access to Azure Blob Storage containers
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add some information why one would want to do that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants