-
Notifications
You must be signed in to change notification settings - Fork 3
feat: add documentation for direct access to Azure Storage accounts #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| To access files in an Azure Blob Storage container for running _Anovos_ in Azure Databricks platform, | ||
| you need to mount that container in the DBFS path. | ||
|
|
||
| ```spark.conf.set("fs.azure.account.key.<storage-account-name>.dfs.core.windows.net", <storage-account-key>``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where do I use/add this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ionicsolutions
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also augment/update the intro of the page:
1. Processing datasets stored directly on DBFS
2. Processing datasets stored on [Azure Blob Storage](https://azure.microsoft.com/services/storage/blobs/)
Generally, we recommend the first option, as it requires slightly less configuration.
However, if you're already storing your datasets on Azure Blob Storage, mounting the respective containers
to DBFS allows you to directly process them with _Anovos_.
| To access files in an Azure Blob Storage container for running _Anovos_ in Azure Databricks platform, | ||
| you need to mount that container in the DBFS path. | ||
|
|
||
| ```spark.conf.set("fs.azure.account.key.<storage-account-name>.dfs.core.windows.net", <storage-account-key>``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| Here, | ||
| - `<storage-account-name>` is the name of your Azure Blob Storage account | ||
| - `<storage-account-key>` is the value of the Storage account key (TODO: this is bad practice and should be solved with a secret) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mathiaspet Please provide information on how to do this properly. Don't mention a bad practice in the docs :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I would assume that this is fine in certain situations and not others. Would be great to tell people when/why this is (not) recommended.
| Once you have updated the configuration file, copy it to Azure Databricks using the same command that was used | ||
| in [Step 1.2](#step-12-prepare-and-copy-the-workflow-configuration-and-data-to-dbfs). | ||
|
|
||
| You can now configure the `file_path` to point to that location. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where`/how?
| The remaining steps are the same as above, so please continue with | ||
| [Step 1.4](#step-14-configure-and-launch-an-anovos-workflow-as-a-databricks-job) | ||
|
|
||
| ## 3. Anovos on Azure Databricks using direct access to Azure Blob Storage containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add some information why one would want to do that?
No description provided.