Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ whose identifiers will be used here.

- [infinisil-test-org](https://github.com/infinisil-test-org): A GitHub organisation you're part of
- Repositories:
- [infinisil-test-org/empty](https://github.com/infinisil-test-org/empty): An empty one with no activity on it
- [infinisil-test-org/empty](https://github.com/infinisil-test-org/empty): A repository with only one commit, not attributed to your GitHub user
- [infinisil-test-org/active](https://github.com/infinisil-test-org/active): One where you have some activity
- [infinisil-test-org/nixpkgs-committers](https://github.com/infinisil-test-org/nixpkgs-committers): A fork of the upstream repo

Expand All @@ -22,7 +22,7 @@ whose identifiers will be used here.
Once you have the above setup (or got @infinisil to add yourself to his), you have to prepare the following:

- Add some activity of yours to the `active` repo.
To cover all code branches it's recommended to create, push to and delete a branch.
To match Nixpkgs, it’s recommended to do this by setting the “Default commit message” for merge commits to “Pull request title”, then creating and merging a PR.
You can do this from the web interface.
- Get the GitHub CLI available (`pkgs.github-cli`) and authenticate it using `gh auth login`
- A local Git clone of this repository with the `origin` remote set to the test repository:
Expand Down
2 changes: 2 additions & 0 deletions scripts/common.sh
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
set -euo pipefail

export TZ=UTC

log() {
echo "$@" >&2
}
Expand Down
41 changes: 23 additions & 18 deletions scripts/retire.sh
Original file line number Diff line number Diff line change
Expand Up @@ -59,9 +59,16 @@ cd "$DIR"
for login in *; do

# Figure out when this person received the commit bit
# Get the unix epoch of the first commit that touched this file
# Get the unix epoch of the last commit that added this file
# --first-parent is important to get the time of when the main branch was changed
fileCommitEpoch=$(git log --reverse --first-parent --format=%cd --date=unix -- "$login" | head -1)
fileCommitEpoch=$(git log \
--first-parent \
--no-follow \
--diff-filter=A \
--max-count=1 \
--format=%cd \
--date=unix \
-- "$login")
if (( fileCommitEpoch < createdOnReceptionEpoch )); then
# If it was created before creation actually matched the reception date
# This branch can be removed after 2026-04-23
Expand Down Expand Up @@ -101,22 +108,20 @@ for login in *; do
continue
fi

trace gh api -X GET /repos/"$ORG"/"$ACTIVITY_REPO"/activity \
-f time_period=year \
-f actor="$login" \
Comment on lines -104 to -106
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding feedback for GitHub, it seems like what's really missing is that the activity endpoint should keep track of merge queue additions. If we had that, we wouldn't need any changes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge queue counting as activity would be good too and probably something we should raise, but I’m guessing it’d be a less trivial fix (it seems like much of the merge queue stuff happens through an internal bot account) – and we’d still need to make adjustments here to filter out non‐PR‐merging activity, so I’m not sure it’d be a net reduction in complexity.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know how GitHub's backend works, so I don't think we can make realistic estimates for complexities of changes, neither for the username regex nor for the activity endpoint. What we can do is suggest multiple potential solutions to them, and see if they can do any of them.

For filtering the non-PR-merging activity, it would just be select(.activity_type == ...) in the jq.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the merge queue events are visible in the activity results - just for the web-flow or other bot users. So it'd be a change that doesn't make any semantic sense for them: The user that pushes the head of the temporary merge queue branch to the target branch is the bot.

The right thing to do would be to look at the events where the PR is added to the merge queue. But these are not repository events, they are PR events. We can already fetch them for each PR, but we won't get them via the activity endpoint.

TLDR: I don't even think there is a reasonable change for GitHub to make - everything works as it should.

PR_PREFIX="$ORG/$ACTIVITY_REPO" \
trace gh api -X GET /repos/"$ORG"/"$ACTIVITY_REPO"/commits \
-f since="$(date --date='1 year ago' --iso-8601=seconds)" \
-f author="$login" \
-f committer=web-flow \
-f per_page=100 \
Comment on lines +112 to 116
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can avoid the issues with @lf- by using the GraphQL API:

#!/usr/bin/env bash
set -euo pipefail

user=$1

id=$(gh api graphql -f query='query($user: String!) { user(login: $user) { id } })' -f user="$user" --jq '.data.user.id')

if ! result=$(gh api graphql --paginate -f query="$(<commits.graphql)" -f id="$id" -f since="$(date -Iseconds --date='1 year ago')"); then
  echo "$result"
else
  jq . > result.json <<< "$result"
fi

gh api graphql --paginate -f query='
query($id: ID!, $since: GitTimestamp!, $endCursor: String) {
  repository(owner: "NixOS", name: "nixpkgs") {
    ref(qualifiedName: "master") {
      target {
        ... on Commit {
          history(since: $since, first: 100, author: { id: $id }, after: $endCursor) {
            nodes {
              parents(first: 2) {
                nodes {
                  oid
                }
              }
              signature {
                wasSignedByGitHub
              }
              committedDate
              url
            }
            pageInfo {
              hasNextPage
              endCursor
            }
          }
        }
      }
    }
  }
}' -f id="$id" -f since="$(date -Iseconds --date='1 year ago')" \
  --jq '.data.repository.ref.target.history.nodes[] | select((.parents.nodes | length > 1) and .signature.wasSignedByGitHub) | pick(.url, .committedDate)'

I'm out of time myself for now, but I can integrate this into the PR later.

Fyi: The gh CLI supports pagination both for the REST and GraphQL API's ;)

Copy link
Member Author

@emilazy emilazy Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this isn’t the first time automation has had to special‐case @lf-, and some things are exposed only through the REST API, I’d much rather use our connection to GitHub to try and get this API bug fixed before considering increasing the code complexity like this.

I deliberately skipped pagination, since it’s extremely unlikely for someone to have 100 @web-flow commits through after their last PR merge, and it’d make the script run much slower in the presence of highly‐active committers. This GraphQL version makes that case much more likely, since it will pick up every commit authored by a committer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, to be clear, when Emily iterated on this, we discussed things - and she already wrote a very similar GraphQL query, but the problem there is that it is impossible to filter by committer via GraphQL.

Copy link
Member Author

@emilazy emilazy Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does filter after the fact by the GitHub signature, which should correctly handle that, FWIW – but it also means having to enumerate every authored commit in the past year (since 100 non‐web commits authored after the last merge is much more plausible), which for active committers I think would significantly slow the script down. The commits endpoint is already somewhat slower than the activity one. So I don’t think this is a good idea just to avoid a bug we can report to GitHub that adds 1 PR/year additional triage load, even before code complexity is taken into account.

Copy link
Member

@lf- lf- Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I reported this bug to GitHub and they told me they don't care to fix it. It's that they're using the current username regex while evaluating searches rather than one which matches their historical username requirements, and it's probably a 1-line fix. Their reply didn't really make sense to me as to why they didn't want to fix it.

You can try again if you'd like.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it’d make the script run much slower in the presence of highly‐active committers

We can stop the pagination once any valid merge has been found, which should always just need one page for active committers.

While I'm happy to maintain this repo even with added complexity to get it work nicely, it really does seem like this is a one-off case, especially considering what @lf- mentioned, so I guess this is fine.

Copy link
Member Author

@emilazy emilazy Dec 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lf- Aha, interesting. I’m not sure if that’s the only thing going on here, though – Wolfgang pointed out that he has seen it happen with @mbrgm too, which presumably doesn’t have any name validation issues. Maybe the regex is for search endpoints and whatever is going on with the API here is something else – or maybe the issue with @mbrgm is unrelated (it seems they have some commits in master, but they don’t show up in /commits?author=… in the web interface either, whereas yours do).

I certainly think it’s worth asking GitHub about, since we have another call with them scheduled in a week anyway.

@infinisil

I tried the GraphQL snippet here with @fabaff, and it took almost 5 minutes for that one committer, so I think this approach is certainly a non‐starter without some kind of short‐circuiting. However there are subtleties to that too, when considering e.g. the “user has become inactive again” logic.

What would probably be more viable is doing something like git clone --bare --single-branch --no-tags --filter=tree:0 https://github.com/NixOS/nixpkgs.git, git log --first-parent to aggregate commits from the last year and filter by author (if the committer is GitHub) and otherwise committer, group by email address, take the first commit for each, and then use the API to map those commits to usernames; whatever is left are inactive committers. But I believe the current state of the PR is really fine and think it would not be a good return‐on‐investment to work on that.

--jq '.[] |
"- \(.timestamp) [\(.activity_type) on \(.ref | ltrimstr("refs/heads/"))](https://github.com/'"$ORG/$ACTIVITY_REPO"'/\(
if .activity_type == "branch_creation" then
"commit/\(.after)"
elif .activity_type == "branch_deletion" then
"commit/\(.before)"
else
"compare/\(.before)...\(.after)"
end
))"' \
# PR merge commits have two parents. We also check it’s an
# authentic GitHub commit, because… why not?
select((.parents | length) == 2 and .commit.verification.verified) |
(.commit.message | capture(" \\((?<pr>#[0-9]+)\\)$").pr) as $pr |
"- `\(.commit.committer.date)` – \(env.PR_PREFIX)\($pr)"' \
> "$tmp/$login"
activityCount=$(wc -l <"$tmp/$login")
mergeCount=$(wc -l <"$tmp/$login")

if [[ "$prState" == open ]]; then
# If there is an open PR already
Expand All @@ -126,7 +131,7 @@ for login in *; do
log "$login has a retirement PR due, unmarking PR as draft and commenting with next steps"
effect gh pr ready --repo "$ORG/$MEMBER_REPO" "$prNumber"
{
if (( activityCount > 0 )); then
if (( mergeCount > 0 )); then
echo "One month has passed, and @$login has been active again:"
cat "$tmp/$login"
echo ""
Expand All @@ -151,7 +156,7 @@ for login in *; do
else
log "$login has a retirement PR pending"
fi
elif (( activityCount <= 0 )); then
elif (( mergeCount <= 0 )); then
log "$login has become inactive, opening a PR"
# If there is no PR yet, but they have become inactive
(
Expand Down Expand Up @@ -185,7 +190,7 @@ for login in *; do
-f "labels[]=retirement" >/dev/null
)
else
log "$login is active with $activityCount activities"
log "$login is active with $mergeCount merges"
fi
log ""
done