Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add default behaviour for jobs that dont have some keys and handle the non-existent of some json_vals #1017

Merged
merged 6 commits into from
Jan 17, 2025

Conversation

EmanElsaban
Copy link
Contributor

@EmanElsaban EmanElsaban commented Jan 8, 2025

In this PR, we add the .get to some keys that we have seen error out when running the migration script in stage and that's because these jobs were too old and they didnt have some of these keys specified. If a key errors out, then there is no json val written for that action run. Thus, this PR sets a default value for those keys. It also adds a try/except to catch the scenario where a json_val doesnt exist for a key, otherwise I think it would fail silently and reset jobs to 0.

Note: I didn't change all keys to use .get instead, I wanted to change whatever was necessary as I think some keys should still error out on missing key if they are not optional? thoughts on this.

@EmanElsaban EmanElsaban requested a review from a team as a code owner January 8, 2025 14:55
"mem": state_data.get("mem"),
"disk": state_data.get("disk"),
"cap_add": state_data.get("cap_add", []),
"cap_drop": state_data.get("cap_drop", []),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have a default if cap_drop isn't provided in task_proc - but i think to take advantage of that we'd have to not pass through a value if we didn't get one, which would probably be a bit tricky

i sorta wonder if we should instead default to re-adding the default cap drops here

although: i'm kinda surpised that there's stuff missing these to begin with

Copy link
Contributor Author

@EmanElsaban EmanElsaban Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC I have seen a job that was either missing cap_add or cap_drop. There is a lot of ancient data in the db that has lots of these configs missing, could've been testing data though not sure. I see in actioncommandconfig theyre set to the config if it exists or empty list

            cap_add=config.cap_add or [],
            cap_drop=config.cap_drop or [],

"disk": state_data["disk"],
"cap_add": state_data["cap_add"],
"cap_drop": state_data["cap_drop"],
"command": state_data.get("command"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything should have a command - sounds like maybe we have some invalid data stored in dynamo?

@EmanElsaban EmanElsaban requested a review from nemacysts January 8, 2025 15:20
KaspariK
KaspariK previously approved these changes Jan 15, 2025
nemacysts
nemacysts previously approved these changes Jan 16, 2025
Copy link
Member

@nemacysts nemacysts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo, it'd be nice to add some explanatory comments here since this seems likely to surprise future yelpers that aren't us (or even us in the future after we forget everything :p) without some breadcrumbs :)

tron/core/action.py Show resolved Hide resolved
tron/core/actionrun.py Show resolved Hide resolved
@@ -811,12 +811,12 @@ def to_json(state_data: dict) -> Optional[str]:
"job_run_id": state_data["job_run_id"],
"action_name": state_data["action_name"],
"state": state_data["state"],
"original_command": state_data["original_command"],
"original_command": state_data.get("original_command"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i'm kinda surprised that original_command and attempts don't exist for everything - i would have assumed that these are pretty standard features for all the ActionRuns and would always exist

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen both fail for a job from what i recall before

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we're not too pressed for time, it'd be nice to look at those jobs and see if they're junk data - but otherwise a # TODO: figure out why we've seen cases of this - this shouldn't happen seems fine to me as well :)

@EmanElsaban EmanElsaban dismissed stale reviews from nemacysts and KaspariK via 27938cb January 17, 2025 18:46
@EmanElsaban EmanElsaban merged commit 9586d44 into master Jan 17, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants