-
-
Notifications
You must be signed in to change notification settings - Fork 11
scripts/retire.sh: use commits endpoint rather than activity #73
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,7 @@ | ||
| set -euo pipefail | ||
|
|
||
| export TZ=UTC | ||
|
|
||
| log() { | ||
| echo "$@" >&2 | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -59,9 +59,16 @@ cd "$DIR" | |
| for login in *; do | ||
|
|
||
| # Figure out when this person received the commit bit | ||
| # Get the unix epoch of the first commit that touched this file | ||
| # Get the unix epoch of the last commit that added this file | ||
| # --first-parent is important to get the time of when the main branch was changed | ||
| fileCommitEpoch=$(git log --reverse --first-parent --format=%cd --date=unix -- "$login" | head -1) | ||
| fileCommitEpoch=$(git log \ | ||
| --first-parent \ | ||
| --no-follow \ | ||
| --diff-filter=A \ | ||
| --max-count=1 \ | ||
| --format=%cd \ | ||
| --date=unix \ | ||
| -- "$login") | ||
| if (( fileCommitEpoch < createdOnReceptionEpoch )); then | ||
| # If it was created before creation actually matched the reception date | ||
| # This branch can be removed after 2026-04-23 | ||
|
|
@@ -101,22 +108,20 @@ for login in *; do | |
| continue | ||
| fi | ||
|
|
||
| trace gh api -X GET /repos/"$ORG"/"$ACTIVITY_REPO"/activity \ | ||
| -f time_period=year \ | ||
| -f actor="$login" \ | ||
| PR_PREFIX="$ORG/$ACTIVITY_REPO" \ | ||
| trace gh api -X GET /repos/"$ORG"/"$ACTIVITY_REPO"/commits \ | ||
| -f since="$(date --date='1 year ago' --iso-8601=seconds)" \ | ||
| -f author="$login" \ | ||
| -f committer=web-flow \ | ||
| -f per_page=100 \ | ||
|
Comment on lines
+112
to
116
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We can avoid the issues with #!/usr/bin/env bash
set -euo pipefail
user=$1
id=$(gh api graphql -f query='query($user: String!) { user(login: $user) { id } })' -f user="$user" --jq '.data.user.id')
if ! result=$(gh api graphql --paginate -f query="$(<commits.graphql)" -f id="$id" -f since="$(date -Iseconds --date='1 year ago')"); then
echo "$result"
else
jq . > result.json <<< "$result"
fi
gh api graphql --paginate -f query='
query($id: ID!, $since: GitTimestamp!, $endCursor: String) {
repository(owner: "NixOS", name: "nixpkgs") {
ref(qualifiedName: "master") {
target {
... on Commit {
history(since: $since, first: 100, author: { id: $id }, after: $endCursor) {
nodes {
parents(first: 2) {
nodes {
oid
}
}
signature {
wasSignedByGitHub
}
committedDate
url
}
pageInfo {
hasNextPage
endCursor
}
}
}
}
}
}
}' -f id="$id" -f since="$(date -Iseconds --date='1 year ago')" \
--jq '.data.repository.ref.target.history.nodes[] | select((.parents.nodes | length > 1) and .signature.wasSignedByGitHub) | pick(.url, .committedDate)'I'm out of time myself for now, but I can integrate this into the PR later. Fyi: The
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since this isn’t the first time automation has had to special‐case I deliberately skipped pagination, since it’s extremely unlikely for someone to have 100 @web-flow commits through after their last PR merge, and it’d make the script run much slower in the presence of highly‐active committers. This GraphQL version makes that case much more likely, since it will pick up every commit authored by a committer.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, to be clear, when Emily iterated on this, we discussed things - and she already wrote a very similar GraphQL query, but the problem there is that it is impossible to filter by committer via GraphQL.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This does filter after the fact by the GitHub signature, which should correctly handle that, FWIW – but it also means having to enumerate every authored commit in the past year (since 100 non‐web commits authored after the last merge is much more plausible), which for active committers I think would significantly slow the script down. The commits endpoint is already somewhat slower than the activity one. So I don’t think this is a good idea just to avoid a bug we can report to GitHub that adds 1 PR/year additional triage load, even before code complexity is taken into account.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I reported this bug to GitHub and they told me they don't care to fix it. It's that they're using the current username regex while evaluating searches rather than one which matches their historical username requirements, and it's probably a 1-line fix. Their reply didn't really make sense to me as to why they didn't want to fix it. You can try again if you'd like.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We can stop the pagination once any valid merge has been found, which should always just need one page for active committers. While I'm happy to maintain this repo even with added complexity to get it work nicely, it really does seem like this is a one-off case, especially considering what @lf- mentioned, so I guess this is fine.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @lf- Aha, interesting. I’m not sure if that’s the only thing going on here, though – Wolfgang pointed out that he has seen it happen with I certainly think it’s worth asking GitHub about, since we have another call with them scheduled in a week anyway. I tried the GraphQL snippet here with What would probably be more viable is doing something like |
||
| --jq '.[] | | ||
| "- \(.timestamp) [\(.activity_type) on \(.ref | ltrimstr("refs/heads/"))](https://github.com/'"$ORG/$ACTIVITY_REPO"'/\( | ||
| if .activity_type == "branch_creation" then | ||
| "commit/\(.after)" | ||
| elif .activity_type == "branch_deletion" then | ||
| "commit/\(.before)" | ||
| else | ||
| "compare/\(.before)...\(.after)" | ||
| end | ||
| ))"' \ | ||
| # PR merge commits have two parents. We also check it’s an | ||
| # authentic GitHub commit, because… why not? | ||
| select((.parents | length) == 2 and .commit.verification.verified) | | ||
wolfgangwalther marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| (.commit.message | capture(" \\((?<pr>#[0-9]+)\\)$").pr) as $pr | | ||
| "- `\(.commit.committer.date)` – \(env.PR_PREFIX)\($pr)"' \ | ||
| > "$tmp/$login" | ||
| activityCount=$(wc -l <"$tmp/$login") | ||
| mergeCount=$(wc -l <"$tmp/$login") | ||
|
|
||
| if [[ "$prState" == open ]]; then | ||
| # If there is an open PR already | ||
|
|
@@ -126,7 +131,7 @@ for login in *; do | |
| log "$login has a retirement PR due, unmarking PR as draft and commenting with next steps" | ||
| effect gh pr ready --repo "$ORG/$MEMBER_REPO" "$prNumber" | ||
| { | ||
| if (( activityCount > 0 )); then | ||
| if (( mergeCount > 0 )); then | ||
| echo "One month has passed, and @$login has been active again:" | ||
| cat "$tmp/$login" | ||
| echo "" | ||
|
|
@@ -151,7 +156,7 @@ for login in *; do | |
| else | ||
| log "$login has a retirement PR pending" | ||
| fi | ||
| elif (( activityCount <= 0 )); then | ||
| elif (( mergeCount <= 0 )); then | ||
| log "$login has become inactive, opening a PR" | ||
| # If there is no PR yet, but they have become inactive | ||
| ( | ||
|
|
@@ -185,7 +190,7 @@ for login in *; do | |
| -f "labels[]=retirement" >/dev/null | ||
| ) | ||
| else | ||
| log "$login is active with $activityCount activities" | ||
| log "$login is active with $mergeCount merges" | ||
| fi | ||
| log "" | ||
| done | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Regarding feedback for GitHub, it seems like what's really missing is that the activity endpoint should keep track of merge queue additions. If we had that, we wouldn't need any changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Merge queue counting as activity would be good too and probably something we should raise, but I’m guessing it’d be a less trivial fix (it seems like much of the merge queue stuff happens through an internal bot account) – and we’d still need to make adjustments here to filter out non‐PR‐merging activity, so I’m not sure it’d be a net reduction in complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't know how GitHub's backend works, so I don't think we can make realistic estimates for complexities of changes, neither for the username regex nor for the activity endpoint. What we can do is suggest multiple potential solutions to them, and see if they can do any of them.
For filtering the non-PR-merging activity, it would just be
select(.activity_type == ...)in thejq.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the merge queue events are visible in the activity results - just for the web-flow or other bot users. So it'd be a change that doesn't make any semantic sense for them: The user that pushes the head of the temporary merge queue branch to the target branch is the bot.
The right thing to do would be to look at the events where the PR is added to the merge queue. But these are not repository events, they are PR events. We can already fetch them for each PR, but we won't get them via the activity endpoint.
TLDR: I don't even think there is a reasonable change for GitHub to make - everything works as it should.