Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate report fails on Databricks Shared cluster #1584

Open
thijs-nijhuis opened this issue Jul 3, 2024 · 5 comments
Open

Generate report fails on Databricks Shared cluster #1584

thijs-nijhuis opened this issue Jul 3, 2024 · 5 comments
Labels
Bug Something isn't working Triage 👀

Comments

@thijs-nijhuis
Copy link

Describe the bug
When you run the 'edr report' command from a notebook that has elementary installed as a cluster library (so it is installed on start up and persisted across sessions), the report generation will fail on a permission error when trying to run 'dbt deps' if the cluster is in 'shared' access mode. If the cluster is in 'single user' access mode the command will succeed.

To Reproduce

  1. Create an all purpose compute cluster with access mode 'shared'
  2. Install the "elementary-data==1.5.1" from PyPi on it
  3. Connect to a GitHub repo that contains a DBT project opr upload one to your workspace
  4. Create a new Notebook with only one Python cell that contains this command:
%sh
edr report --profiles-dir "/Workspace/Repos/<username>/<repo_name>/<path_to_project_folder>" --project-dir "/Workspace/Repos/<username>/<repo_name>/<path_to_project_folder>" --target-path "/Workspace/Repos/<username>/<repo_name>/<path_to_a_folder>" --update-dbt-package false
  1. Attach the notebook to the create cluster and run the cell

Expected behavior
I expected the the report to be generated at the provided location, just like it does when using a cluster in 'Single-user' mode.

Screenshots

    ________                          __                  
   / ____/ /__  ____ ___  ___  ____  / /_____ ________  __
  / __/ / / _ \/ __ `__ \/ _ \/ __ \/ __/ __ `/ ___/ / / /
 / /___/ /  __/ / / / / /  __/ / / / /_/ /_/ / /  / /_/ / 
/_____/_/\___/_/ /_/ /_/\___/_/ /_/\__/\__,_/_/   \__, /  
                                                 /____/   

Any feedback and suggestions are welcomed! join our community here - https://bit.ly/slack-elementary

2024-07-03 15:09:33 — INFO — Running with edr=0.15.1
2024-07-03 15:09:34 — INFO — Installing packages for edr internal dbt package...
2024-07-03 15:09:34 — INFO — Running dbt --log-format json deps --project-dir /local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/monitor/dbt_project --profiles-dir /Workspace/Repos/<username>/<repo_name>/<path_to_project_folder>
2024-07-03 15:09:40 — INFO — Running with dbt=1.8.3
2024-07-03 15:09:40 — INFO — Encountered an error:
[Errno 13] Permission denied: '/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/monitor/dbt_project/package-lock.yml'
2024-07-03 15:09:40 — INFO — Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/cli/requires.py", line 138, in wrapper
    result, success = func(*args, **kwargs)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/cli/requires.py", line 101, in wrapper
    return func(*args, **kwargs)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/cli/requires.py", line 201, in wrapper
    return func(*args, **kwargs)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/cli/requires.py", line 247, in wrapper
    return func(*args, **kwargs)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/cli/main.py", line 447, in deps
    results = task.run()
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/task/deps.py", line 217, in run
    self.lock()
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/task/deps.py", line 204, in lock
    with open(lock_filepath, "w") as lock_obj:
PermissionError: [Errno 13] Permission denied: '/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/monitor/dbt_project/package-lock.yml'

Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/clients/dbt/dbt_runner.py", line 88, in _run_command
    result = subprocess.run(
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['dbt', '--log-format', 'json', 'deps', '--project-dir', '/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/monitor/dbt_project', '--profiles-dir', '/Workspace/Repos/<username>/<repo_name>/<path_to_project_folder>']' returned non-zero exit status 2.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/bin/edr", line 8, in <module>
    sys.exit(cli())
  File "/databricks/python/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/databricks/python/lib/python3.10/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/cli/cli.py", line 67, in invoke
    return super().invoke(ctx)
  File "/databricks/python/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/databricks/python/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/databricks/python/lib/python3.10/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/databricks/python/lib/python3.10/site-packages/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/monitor/cli.py", line 442, in report
    data_monitoring = DataMonitoringReport(
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/monitor/data_monitoring/report/data_monitoring_report.py", line 42, in __init__
    super().__init__(
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/monitor/data_monitoring/data_monitoring.py", line 35, in __init__
    self.internal_dbt_runner = self._init_internal_dbt_runner()
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/monitor/data_monitoring/data_monitoring.py", line 61, in _init_internal_dbt_runner
    internal_dbt_runner = DbtRunner(
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/clients/dbt/dbt_runner.py", line 48, in __init__
    self._run_deps_if_needed()
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/clients/dbt/dbt_runner.py", line 318, in _run_deps_if_needed
    self.deps()
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/clients/dbt/dbt_runner.py", line 116, in deps
    success, _ = self._run_command(
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/clients/dbt/dbt_runner.py", line 99, in _run_command
    raise DbtCommandError(err, command_args, logs=logs)
elementary.exceptions.exceptions.DbtCommandError: Failed to run dbt command.
Encountered an error:
[Errno 13] Permission denied: '/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/monitor/dbt_project/package-lock.yml'
Traceback (most recent call last):
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/cli/requires.py", line 138, in wrapper
    result, success = func(*args, **kwargs)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/cli/requires.py", line 101, in wrapper
    return func(*args, **kwargs)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/cli/requires.py", line 201, in wrapper
    return func(*args, **kwargs)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/cli/requires.py", line 247, in wrapper
    return func(*args, **kwargs)
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/cli/main.py", line 447, in deps
    results = task.run()
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/task/deps.py", line 217, in run
    self.lock()
  File "/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/dbt/task/deps.py", line 204, in lock
    with open(lock_filepath, "w") as lock_obj:
PermissionError: [Errno 13] Permission denied: '/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/elementary/monitor/dbt_project/package-lock.yml'

Environment (please complete the following information):

  • Elementary CLI (edr) version: [e.g. 0.5.3], can be found by running pip show elementary-data
    • 0.15.1
  • Elementary dbt package version: [e.g. 0.4.1], can be found in packages.yml file
    • 0.15.2
  • dbt version you're using [e.g. 1.8.1]
    • 1.8.3
  • Data warehouse [e.g. snowflake]
    • Databricks
  • Infrastructure details (e.g. operating system, prod / dev / staging, deployment infra, CI system, etc)
    • Running on Shared all purpose compute on Azure Databricks

Additional context
I did a bit of debugging and testing. When running the command it seems to check if the dbt packages from the project and from the internal dbt project are installed. The first one succeeds because the project is in a writable location. The seconds fails because it tries to write/create a file called package-lock.yml in the internal dbt project inside the elementary package folder. This folder is not writable on a shared cluster (I am actually surprised that it IS writeable on a single user cluster).

I also tried installing elementary as part of the notebook instead of on cluster startup, like so: %pip install elementary-data=0.15.1. After you restart the Python kernel and run the same command it DOES succeed. This is because the elementary package in this case is installed in a location that is writeable for the logged in user. Unfortunately this is not an option for us as we run our project as a wheel and both elementary and dbt-databricks are installed as part of that wheel.

Maybe it is an idea to have the dbt_packages pre-installed when installing elementary? That way dbt deps won't need to write anything and it would also speed up the process a bit. This might fail when it tries to create a target folder though.
Alternatively, perhaps we can configure the location of all writeable locations (target and dbt_packages) as part of the edr command? Just like we can configure the location of the report output.

Would you be willing to contribute a fix for this issue?
Sure.

@thijs-nijhuis thijs-nijhuis added Bug Something isn't working Triage 👀 labels Jul 3, 2024
@noel
Copy link

noel commented Aug 12, 2024

Any update on this? is there a work-around?

@alxsbn
Copy link

alxsbn commented Oct 28, 2024

@thijs-nijhuis @noel I have the same behavior with send-report with dbt 1.8.7 (Databricks too) where I try to execute the command within a contianer (no elevation). Did you find a workaround ?

@noel
Copy link

noel commented Oct 28, 2024

no, I forked the repo and added the missing file. wish they would add it so we dont need a fork

@alxsbn
Copy link

alxsbn commented Nov 4, 2024

@noel Can you share your fork? :)

@noel
Copy link

noel commented Nov 4, 2024

here it is, but it has not been updated recently
https://github.com/datacoves/elementary

this is the branch with this fix
https://github.com/datacoves/elementary/tree/ng_fix_packages_lock

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something isn't working Triage 👀
Projects
None yet
Development

No branches or pull requests

3 participants