Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not produce reingestion workflow documentation if it matches regular workflow documentation #3207

Closed
AetherUnbound opened this issue Oct 16, 2023 · 1 comment · Fixed by #4072
Assignees
Labels
📄 aspect: text Concerns the textual material in the repository ✨ goal: improvement Improvement to an existing user-facing feature good first issue New-contributor friendly help wanted Open to participation from the community 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: documentation Related to Sphinx documentation

Comments

@AetherUnbound
Copy link
Collaborator

Description

Presently, our DAG documentation generation script will generate documentation for all DAGs that define a doc_md on the DAG:

For some DAGs (namely reingestion workflows, where we reuse the same DAG machinery over different windows), this produces the exact same documentation multiple times. This is unnecessarily redundant, here's an example from Flickr:

We should record the DAG doc text while iterating over the DAGs and skip producing documentation for any DAGs that have already matched a previous example exactly.

The easiest way to do this might be to record the doc markdown in a mapping from doc markdown: dag ID, then check for the doc markdown's presence in that mapping before producing the string. This is complicated by the fact that we want the original workflow (not the reingestion one) to be the DAG which receives the documentation, not the reingestion workflow.

We could also instead add some logic when coming across DAGs with reingestion in the DAG ID to find and check the original DAG and see if the docstrings match. If they don't match, then continue with processing as normal, otherwise skip.

@AetherUnbound AetherUnbound added the 🟩 priority: low Low priority and doesn't need to be rushed label Oct 16, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Oct 16, 2023
@AetherUnbound AetherUnbound added good first issue New-contributor friendly help wanted Open to participation from the community ✨ goal: improvement Improvement to an existing user-facing feature 📄 aspect: text Concerns the textual material in the repository 🧱 stack: documentation Related to Sphinx documentation labels Oct 16, 2023
@mattfergoda
Copy link
Contributor

Please assign this one to me 😊

@openverse-bot openverse-bot moved this from 📋 Backlog to 📅 To Do in Openverse Backlog Apr 8, 2024
@openverse-bot openverse-bot moved this from 📅 To Do to 🏗 In Progress in Openverse Backlog Apr 8, 2024
@openverse-bot openverse-bot moved this from 🏗 In Progress to ✅ Done in Openverse Backlog Apr 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📄 aspect: text Concerns the textual material in the repository ✨ goal: improvement Improvement to an existing user-facing feature good first issue New-contributor friendly help wanted Open to participation from the community 🟩 priority: low Low priority and doesn't need to be rushed 🧱 stack: documentation Related to Sphinx documentation
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

2 participants