-
Notifications
You must be signed in to change notification settings - Fork 64
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default behaviour for jobs that dont have some keys and handle the non-existent of some json_vals #1017
Conversation
…e non-existent of some json_vals
tron/core/action.py
Outdated
"mem": state_data.get("mem"), | ||
"disk": state_data.get("disk"), | ||
"cap_add": state_data.get("cap_add", []), | ||
"cap_drop": state_data.get("cap_drop", []), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have a default if cap_drop isn't provided in task_proc - but i think to take advantage of that we'd have to not pass through a value if we didn't get one, which would probably be a bit tricky
i sorta wonder if we should instead default to re-adding the default cap drops here
although: i'm kinda surpised that there's stuff missing these to begin with
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC I have seen a job that was either missing cap_add or cap_drop. There is a lot of ancient data in the db that has lots of these configs missing, could've been testing data though not sure. I see in actioncommandconfig theyre set to the config if it exists or empty list
cap_add=config.cap_add or [],
cap_drop=config.cap_drop or [],
tron/core/action.py
Outdated
"disk": state_data["disk"], | ||
"cap_add": state_data["cap_add"], | ||
"cap_drop": state_data["cap_drop"], | ||
"command": state_data.get("command"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
everything should have a command - sounds like maybe we have some invalid data stored in dynamo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imo, it'd be nice to add some explanatory comments here since this seems likely to surprise future yelpers that aren't us (or even us in the future after we forget everything :p) without some breadcrumbs :)
@@ -811,12 +811,12 @@ def to_json(state_data: dict) -> Optional[str]: | |||
"job_run_id": state_data["job_run_id"], | |||
"action_name": state_data["action_name"], | |||
"state": state_data["state"], | |||
"original_command": state_data["original_command"], | |||
"original_command": state_data.get("original_command"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i'm kinda surprised that original_command
and attempts
don't exist for everything - i would have assumed that these are pretty standard features for all the ActionRuns and would always exist
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've seen both fail for a job from what i recall before
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're not too pressed for time, it'd be nice to look at those jobs and see if they're junk data - but otherwise a # TODO: figure out why we've seen cases of this - this shouldn't happen
seems fine to me as well :)
In this PR, we add the .get to some keys that we have seen error out when running the migration script in stage and that's because these jobs were too old and they didnt have some of these keys specified. If a key errors out, then there is no json val written for that action run. Thus, this PR sets a default value for those keys. It also adds a try/except to catch the scenario where a json_val doesnt exist for a key, otherwise I think it would fail silently and reset jobs to 0.
Note: I didn't change all keys to use .get instead, I wanted to change whatever was necessary as I think some keys should still error out on missing key if they are not optional? thoughts on this.