Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cleanup functions to fix recursive loop and no such file raised by broken symlinks #1115

Closed
wants to merge 4 commits into from

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Jul 23, 2024

Introduce tools that help users facing two issues raised by problematic Airflow or dbt project directories:

  1. File not found (as reported in Bug on Cosmos 1.5 No such file or directory "/usr/local/aiflow/dags/dbt/dbt_venv/bin/python" #1096)
  2. Recursive loop (as reported in [Bug] RuntimeError: Detected recursive loop for /usr/local/airflow/dags/dbt/dbt_venv/lib #1076)

Disclaimer

Cosmos is not the cause of the two issues previously mentioned. The reason for these problems is a misconfiguration of the Airflow and/or dbt directories created by users and set (by users) to be used in Cosmos DAGs.

This PR proposes methods to mitigate these problems, but users need to introduce this or similar tooling in their deployment processes to avoid the issue from happening again.

Details on the problems addressed

The first of these issues relates to the dbt project folder having a symbolic link to a file that no longer exists:

No such file or directory "/usr/local/aiflow/dags/dbt/dbt_venv/bin/python"

The second of these issues relates to the dbt project folder having a symbolic link to a parent folder, which leads Airflow not to be able to parse dbt DAGs at all:

RuntimeError: Detected recursive loop for /usr/local/airflow/dags/dbt/dbt_venv/lib

The second issue, in particular, can be very destructive since it can block the scheduler from parsing any DAGs in the Airflow project, leading to critical DAGs not running in the expected schedules.

How to use this PR

For problem (1), this PR has an example DAG, example_cosmos_cleanup_dir_dag, which can be scheduled as any other DAG in Airflow.

It was tested by using the following steps:

(a) Create symlinks and make them invalid

touch sample-file
ln -s `pwd`/sample-file dev/dags/dbt/bla2 
ln -s `pwd`/sample-file dev/dags/dbt/bla3 
rm sample-file

(b) Triggering this DAG

airflow dags test example_cosmos_cleanup_dir_dag  `date -Iseconds`

In its logs, we can see:

[2024-07-23T02:15:00.995+0100] {cleanup.py:25} INFO - Inspecting the directory /Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/dbt for broken symbolic links.
[2024-07-23T02:15:00.997+0100] {cleanup.py:38} WARNING - The folder /Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/dbt contains a symbolic link to a non-existent file: /Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/dbt/bla3
[2024-07-23T02:15:00.997+0100] {cleanup.py:40} INFO - Deleting the invalid symbolic link: /Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/dbt/bla3
[2024-07-23T02:15:00.997+0100] {cleanup.py:38} WARNING - The folder /Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/dbt contains a symbolic link to a non-existent file: /Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/dbt/bla2
[2024-07-23T02:15:00.997+0100] {cleanup.py:40} INFO - Deleting the invalid symbolic link: /Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/dbt/bla2
[2024-07-23T02:15:00.998+0100] {cleanup.py:44} INFO - After inspecting /Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/dbt, identified 2 broken links and deleted 2 of them.

When trying to run it again, it's possible to see no more broken symlinks were found:

After inspecting /Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/dbt, identified 0 broken links and deleted 0 of them.

For problem (2), since the problem happens at Airflow DAG parsing, this issue has to be solved outside of Airflow DAGs/tasks.

We are exposing a command line that allows users to visualize and delete recursive loops. This is an example of how to do both things:

python -m cosmos.cleanup -p dags/dbt -d 

The expectation is that end-users can run this command before deploying to Airflow, ensuring the deployed dbt project folder is valid and does not contain recursive links. This same command also checks for broken symbolic links.

To validate this feature, we did the same steps to reproduce the broken symlinks and also:

ln -s `pwd`/astronomer-cosmos `pwd`/astronomer-cosmos/dags/dbt/bla

Example of logs created by running this command line:

[2024-07-23T01:50:04.264+0100] {cleanup.py:49} INFO - Inspecting the directory dags/dbt for recursive loops.
[2024-07-23T01:50:04.267+0100] {cleanup.py:64} WARNING - Detected recursive loop from /Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/dbt/bla to /Users/tati/Code/cosmos-clean/astronomer-cosmos
[2024-07-23T01:50:04.267+0100] {cleanup.py:67} INFO - Deleting symbolic link: /Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/dbt/bla
[2024-07-23T01:50:04.267+0100] {cleanup.py:71} INFO - After inspecting /Users/tati/Code/cosmos-clean/astronomer-cosmos/dags/dbt, identified 1 recursive loops and deleted 1 of them.


[2024-07-23T01:50:04.267+0100] {cleanup.py:17} INFO - Inspecting the directory dags/dbt for broken symbolic links.
[2024-07-23T01:50:04.268+0100] {cleanup.py:30} WARNING - The folder dags/dbt contains a symbolic link to a non-existent file: dags/dbt/bla3
[2024-07-23T01:50:04.268+0100] {cleanup.py:32} INFO - Deleting the invalid symbolic link: dags/dbt/bla3
[2024-07-23T01:50:04.269+0100] {cleanup.py:30} WARNING - The folder dags/dbt contains a symbolic link to a non-existent file: dags/dbt/bla2
[2024-07-23T01:50:04.269+0100] {cleanup.py:32} INFO - Deleting the invalid symbolic link: dags/dbt/bla2
[2024-07-23T01:50:04.269+0100] {cleanup.py:36} INFO - After inspecting dags/dbt, identified 2 broken links and deleted 2 of them.

From an end-user perspective, the script that solves the problem (2) can be automatically run in a few places, including

  • As part of the CI/CD pipelines
  • As a pre-commit hook

Closes: #1096
Closes: #1076

Copy link

netlify bot commented Jul 23, 2024

Deploy Preview for sunny-pastelito-5ecb04 canceled.

Name Link
🔨 Latest commit 4ad380d
🔍 Latest deploy log https://app.netlify.com/sites/sunny-pastelito-5ecb04/deploys/66a04d12c7a84a0008523fa5

@tatiana tatiana force-pushed the cleanup-broken-symlinks-example branch from 5fdda73 to f282a6b Compare July 23, 2024 01:24
Copy link

codecov bot commented Jul 23, 2024

Codecov Report

Attention: Patch coverage is 37.25490% with 32 lines in your changes missing coverage. Please review.

Project coverage is 95.56%. Comparing base (0cdf2f3) to head (f282a6b).

Files Patch % Lines
cosmos/cleanup.py 37.25% 32 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1115      +/-   ##
==========================================
- Coverage   96.47%   95.56%   -0.91%     
==========================================
  Files          64       65       +1     
  Lines        3287     3338      +51     
==========================================
+ Hits         3171     3190      +19     
- Misses        116      148      +32     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@tatiana tatiana marked this pull request as ready for review July 23, 2024 01:58
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:parsing Related to parsing DAG/DBT improvement, issues, or fixes labels Jul 23, 2024
Copy link
Contributor

@pankajkoti pankajkoti left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM mostly. Minor questions inline

broken_symlinks_count = 0
deleted_symlinks_count = 0
for root_dir, dirs, files in os.walk(dir_path):
paths = [os.path.join(root_dir, filepath) for filepath in files]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we check here if the filepath is a symlink?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was being checked after, but it would be a good improvement!

)


# Airflow DAG parsing fails if recursive loops are found, so this method cannot be used from within an Airflow task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add an example here on how to call this and where to call this like mentioned in the PR description.

Or should we also create a public docs page listing the steps that we can share with users?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a short docs will be great and also we could render example in the docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add some tests for this module or exclude from codecov for the time being?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

)


# Airflow DAG parsing fails if recursive loops are found, so this method cannot be used from within an Airflow task
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a short docs will be great and also we could render example in the docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like an operator or decorator. I'm ok with keeping it in the project's root path, but what are your thoughts on placing it in the operator directory instead?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The challenge is that while identify_broken_symbolic_links could be exposed as an operator, ideally, users would check this before it is deployed to a remote instance of Airflow. Additionally, identify_recursive_loops cannot be exposed as an operator since it would be useless (the exception is raised at Airflow DAG parsing). Therefore, we'd need a non-operator place to expose the second use case, so it felt we could benefit from having both methods together - and it'd be up to the user on how to use them.

Assuming the cause of the issue no longer exists, this DAG can be run only once.
"""

# [START dirty_dir_example]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if we could use tag dirty_dir_example to render this example in docs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@tatiana
Copy link
Collaborator Author

tatiana commented Jul 24, 2024

Thanks a lot for the reviews, @pankajastro and @pankajkoti! I started addressing them, but we had a call with the team who initially raised this error, and we found a more straightforward workaround for the problem, as described in #1076 (comment)

I believe we can close this PR for now, and if this feature becomes useful in the future, we can reopen/recreate it.

@tatiana tatiana closed this Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc area:parsing Related to parsing DAG/DBT improvement, issues, or fixes size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
3 participants