Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add check to skip toggling CloudWatch alarms #3682

Merged
merged 14 commits into from
Feb 14, 2024
Merged

Conversation

krysal
Copy link
Member

@krysal krysal commented Jan 19, 2024

Fixes

Follow up to #3652.

Description

@AetherUnbound mentioned in the PR above a problem running the data refresh locally when AWS credentials aren't set (as is usual locally, since they are not necessary): #3652 (comment)

This PR adds an Airflow variable, AIRFLOW_VAR_TOGGLE_CLOUDWATCH_ALARMS, to decide when to actually modify Cloudwatch alarms, and change the connection to CloudWatch to use a more idiomatic Airflow AWS connection with a hook.

Testing Instructions

Connection

  1. To test the AWS connection, set the AIRFLOW_CONN_AWS_CLOUDWATCH variable with the required credentials as shown in the env.example file. You could also generate it using this snippet to create Connection and convert to URI.
  2. Run the catalog and ingestion server stack.
just up
  1. Run a data_refresh DAG (audio or image), wait for it to finish and then observe in CloudWatch the updates of the alarm in the History tab: ES Production CPU utilization above 50%

CleanShot 2024-02-02 at 14 32 53@2x

Config for whether to toggle the alarm

  1. Add the new variable through the Airflow UI, Header menu > Admin > Variables, with the value false.
TOGGLE_CLOUDWATCH_ALARMS
  1. Run a data refresh again and observe the tasks for enabling/disabling the alarm are skipped.

Checklist

  • My pull request has a descriptive title (not a vague title likeUpdate index.md).
  • My pull request targets the default branch of the repository (main) or a parent feature branch.
  • My commit messages follow best practices.
  • My code follows the established code style of the repository.
  • I added or updated tests for the changes I made (if applicable).
  • I added or updated documentation (if applicable).
  • I tried running the project locally and verified that there are no visible errors.
  • I ran the DAG documentation generator (if applicable).

Developer Certificate of Origin

Developer Certificate of Origin
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
1 Letterman Drive
Suite D4700
San Francisco, CA, 94129

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@krysal krysal added 🟨 priority: medium Not blocking but should be addressed soon 🛠 goal: fix Bug fix 💻 aspect: code Concerns the software code in the repository 🧱 stack: catalog Related to the catalog and Airflow DAGs labels Jan 19, 2024
@krysal krysal requested review from a team as code owners January 19, 2024 15:45
@krysal krysal requested review from fcoveram, stacimc and obulat January 19, 2024 15:45
@krysal krysal force-pushed the fix/local_data_refresh branch from 8467b5d to 484de3c Compare January 19, 2024 15:49
Copy link

Full-stack documentation: https://docs.openverse.org/_preview/3682

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

Changed files 🔄:

@krysal krysal removed the request for review from fcoveram January 19, 2024 18:11
@krysal krysal force-pushed the fix/local_data_refresh branch from b6e7c7f to 0b26dff Compare January 19, 2024 18:11
Copy link
Collaborator

@stacimc stacimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock local testing of the data refreshes -- good catch @AetherUnbound, I did not think of that! I do think it would be preferable to actually skip the tasks if possible.

catalog/dags/common/cloudwatch.py Outdated Show resolved Hide resolved
documentation/catalog/reference/DAGs.md Outdated Show resolved Hide resolved
Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the drive-by review but I find the name SKIP_TOGGLING... a bit confusing like a multiple negative. I feel like ENABLE_TOGGLING... or TOGGLE... would be clearer. This would default to true so that the outcome is the same as SKIP_TOGGLING... that defaults to false.

@krysal krysal marked this pull request as draft January 24, 2024 00:07
@krysal krysal force-pushed the fix/local_data_refresh branch 7 times, most recently from 559c231 to f2f08b3 Compare February 2, 2024 18:20
@krysal krysal marked this pull request as ready for review February 2, 2024 18:44
@krysal krysal requested review from dhruvkb and AetherUnbound and removed request for obulat February 2, 2024 18:44
@krysal
Copy link
Member Author

krysal commented Feb 2, 2024

This is good to go now! Thanks for the suggestions; they are excellent! I am asking for a review from @AetherUnbound, as you may want to check this too.

@AetherUnbound AetherUnbound requested a review from stacimc February 7, 2024 02:02
Copy link
Collaborator

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a few small, suggested changes. Thanks for modifying this to use the idiomatic approach! 😄

catalog/dags/common/cloudwatch.py Outdated Show resolved Hide resolved
catalog/dags/data_refresh/data_refresh_task_factory.py Outdated Show resolved Hide resolved
catalog/dags/data_refresh/data_refresh_task_factory.py Outdated Show resolved Hide resolved
catalog/env.template Outdated Show resolved Hide resolved
@krysal krysal force-pushed the fix/local_data_refresh branch from 9d49c65 to a00ae33 Compare February 7, 2024 16:46
@krysal krysal requested a review from AetherUnbound February 7, 2024 17:26
Copy link
Collaborator

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
I tried running this locally, but it looks like if the disable alarms tasks is skipped, it skips the trigger ingest upstream step too. I think we'll need to change the trigger rule on that one as well!

@krysal krysal force-pushed the fix/local_data_refresh branch from a00ae33 to 88b45ea Compare February 8, 2024 20:09
@krysal
Copy link
Member Author

krysal commented Feb 8, 2024

@AetherUnbound Ah, good catch! They were supposed to inherit the TriggerRule from the group task 🤔 Anyway, I changed it.

@krysal krysal force-pushed the fix/local_data_refresh branch from 88b45ea to 86ae58a Compare February 8, 2024 20:24
@krysal krysal requested a review from AetherUnbound February 8, 2024 20:25
@openverse-bot
Copy link
Collaborator

Based on the medium urgency of this PR, the following reviewers are being gently reminded to review this PR:

@dhruvkb
@stacimc
@AetherUnbound
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was ready for review 4 day(s) ago. PRs labelled with medium urgency are expected to be reviewed within 4 weekday(s)2.

@krysal, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

Copy link
Collaborator

@stacimc stacimc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updates look great, I'm really excited to get this back in!

Copy link
Collaborator

@AetherUnbound AetherUnbound left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Local testing now works well for me with skipping the alarms steps, thanks for fixing that! This is good to go IMO!

@krysal krysal merged commit 002066f into main Feb 14, 2024
41 checks passed
@krysal krysal deleted the fix/local_data_refresh branch February 14, 2024 13:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🛠 goal: fix Bug fix 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: catalog Related to the catalog and Airflow DAGs
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

5 participants