How To Use Pagination With GitHub's API #69826

loujr · 2023-10-11T18:10:27Z

loujr
Oct 11, 2023

Navigating large datasets can present a challenge when making some API calls. To make navigation and parsing easier, GitHub uses Pagination to help curate large datasets into a more manageable length. Pagination refers to dividing large datasets into smaller chunks. These smaller chunks of data become pages that can be navigated through similar to a web request. This guide covers the two methods for pagination within GitHub:

Cursor Based Pagination

Page Based Pagination

To start with, it's important to know a few facts about receiving paginated items:

Different API calls respond with different defaults. For example, a call to List public repositories provides paginated items in sets of 30, whereas a call to the GitHub Search API provides items in sets of 100.
You can specify how many items to receive (up to a maximum of 100); but, for technical reasons, not every endpoint behaves the same. For example, events won't let you set a maximum for items to receive. Be sure to read the documentation on how to handle paginated results for specific endpoints.

Pagination begins at header of the request. The following is an example of an authenticated curl request to view the audit log of our organization:

$ curl -I -H "Accept: application/vnd.github+json" -H "Authorization: Bearer ghp_*****j8fq"   https://api.github.com/enterprises/advacado-corp/audit-log

This is a standard HTTP output the link section forms the Link Header of the API call. The -I parimeter returns only the header information and not the contents.

HTTP/2 200 
server: GitHub.com
date: Mon, 17 Oct 2022 15:15:37 GMT
content-type: application/json; charset=utf-8
content-length: 20854
cache-control: private, max-age=60, s-maxage=60
vary: Accept, Authorization, Cookie, X-GitHub-OTP
etag: "fc5b15308c775934ca63719ff22d9fe623e7e8226235181203424347cec50130"
x-oauth-scopes: admin:enterprise, admin:gpg_key, admin:org, admin:org_hook, admin:public_key, admin:repo_hook, admin:ssh_signing_key, codespace, delete:packages, delete_repo, gist, notifications, project, repo, user, workflow, write:discussion, write:packages
x-accepted-oauth-scopes: admin:enterprise
github-authentication-token-expiration: 2022-12-28 16:47:19 UTC
x-github-media-type: github.v3; format=json
link: <https://api.github.com/enterprises/13827/audit-log?after=MS42NjQzODM5MTkzNDdlKzEyfDM0MkI6NDdBNDo4RTFGMEM6NUIyQkZCMzo2MzM0N0JBRg%3D%3D&before=>; rel="next"
x-github-api-version-selected: 2022-08-09
x-ratelimit-limit: 5000
x-ratelimit-remaining: 4998
x-ratelimit-reset: 1666023299
x-ratelimit-used: 2
x-ratelimit-resource: core
access-control-expose-headers: ETag, Link, Location, Retry-After, X-GitHub-OTP, X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Used, X-RateLimit-Resource, X-RateLimit-Reset, X-OAuth-Scopes, X-Accepted-OAuth-Scopes, X-Poll-Interval, X-GitHub-Media-Type, X-GitHub-SSO, X-GitHub-Request-Id, Deprecation, Sunset
access-control-allow-origin: *
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: origin-when-cross-origin, strict-origin-when-cross-origin
content-security-policy: default-src 'none'
vary: Accept-Encoding, Accept, X-Requested-With
x-github-request-id: C985:4933:1054293:2175681:634D7198

In examining the header information, the Link Header of this request is located in this section of the request:

link: <https://api.github.com/enterprises/13827/audit-log?after=MS42NjQzODM5MTkzNDdlKzEyfDM0MkI6NDdBNDo4RTFGMEM6NUIyQkZCMzo2MzM0N0JBRg%3D%3D&before=>; rel="next"

Let's break down this Link Header. The audit log using pagination terms before and after. These terms will be explained in Navigation Through the Pages. rel=next says that the next page is located at after=MS42NjQzODM5MTkzNDdlKzEyfDM0MkI6NDdBNDo4RTFGMEM6NUIyQkZCMzo2MzM0N0JBRg%3D%3D&before=>.

This is an example of a Link Header that uses page. Notice that instead of being provided cursor links, you are given page numbers to reference. In this example rel="next" shows that the next page is 2 page=2, while the last page is 34 page=34. This is in contrast to before and after that do not contain these references. This means that you are on page one,as pagination defaults at the first page,and there are 33 more pages of information in addClass.

Link: <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=2>; rel="next",
  <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=34>; rel="last"

Note: Always rely on these link relations provided to you. Don't try to guess or construct your own URL.

Using Cursor Based Pagination

There are two ways of Navigation using pagination. This will depend on the output of your Link Header. before= indicates that your pagination terms use before and after.

Before and After

To navigate using before and after. Copy the Link Header generated above into your curl request:

curl -i -H "Accept: application/vnd.github+json" -H "Authorization: Bearer ghp_*****j8fq" https://api.github.com/enterprises/13827/audit-log?after=MS42NjQzODM5MTkzNDdlKzEyfDM0MkI6NDdBNDo4RTFGMEM6NUIyQkZCMzo2MzM0N0JBRg%3D%3D&before=>

This will generate a page of 100 items and new header information that you can use to make the next request. The important part of the output here is the Link Header needs to be generated rather than manually imputed. Copy the entire link into the following output.

link: <https://api.github.com/enterprises/13827/audit-log?after=MS42NjQzODMzMzk2MzZlKzEyfFdxSzIxdGU0MlBWNUp5UzhBWDF6LWc%3D&before=>; rel="next", <https://api.github.com/enterprises/13827/audit-log?after=&before=>; rel="first", <https://api.github.com/enterprises/13827/audit-log?after=&before=MS42NjQzODM5MTcyMjllKzEyfDI4NDE6NEVFNDoxODBDRkM5OjY5REE0MzI6NjMzNDdCQUQ%3D>; rel="prev"

rel="next" provides the next 100 items of results.
rel="prev" provides the previous 100 items of results.

Using Page Based Pagination

Now that you know how many pages there are to receive, you can start navigating through the pages to consume the results. You do this by passing in a page parameter. By default, page always starts at 1. Let's jump ahead to page 14 and see what happens:

$ curl -I "https://api.github.com/search/code?q=addClass+user:mozilla&page=14"

Here's the link header once more:

Link: <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=15>; rel="next",
  <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=34>; rel="last",
  <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=1>; rel="first",
  <https://api.github.com/search/code?q=addClass+user%3Amozilla&page=13>; rel="prev"

As expected, rel="next" is at 15, and rel="last" is still 34. But now we've got some more information: rel="first" indicates the URL for the first page, and more importantly, rel="prev" lets you know the page number of the previous page. Using this information, you could construct some UI that lets users jump between the first, previous, next, or last list of results in an API call.

Changing the number of items received

By passing the per_page parameter, you can specify how many items you want each page to return, up to 100 items. Let's try asking for 50 items about addClass:

$ curl -I "https://api.github.com/search/code?q=addClass+user:mozilla&per_page=50"

Notice what it does to the header response:

Link: <https://api.github.com/search/code?q=addClass+user%3Amozilla&per_page=50&page=2>; rel="next",
  <https://api.github.com/search/code?q=addClass+user%3Amozilla&per_page=50&page=20>; rel="last"

As you might have guessed, the rel="last" information says that the last page is now 20. This is because we are asking for more information per page about our results.

Conclusion

Pagination is a critical tool for navigating API queries on large datasets. In this article, we outlined the two pagination methods within GitHubs REST API: cursor and page based pagination. When navigating using cursor based pagination it is important to use generated header links before and after. Page based pagination uses page numbers to navigate those datasets. Some REST API endpoints use page based pagination, some only respond to cursor based pagination, while others might respond to both. If you are unsure about which pagination method to use, you can get additional instructions from the header information from the REST API call that you just made.

appatalks · 2024-01-20T16:46:09Z

appatalks
Jan 20, 2024

👋 @loujr - Thank you for this amazing right up! 🎉

Different API calls respond with different defaults. For example, a call to List public repositories provides paginated items in sets of 30, whereas a call to the GitHub Search API provides items in sets of 100.

This was the inspiration I needed to write an automated pre-script to do just that ⬆️ for all repositories using the knowledge learned from your post.

My hope is others find this next piece useful 🔽

run_discovery.sh

#!/bin/bash
#
# Example Usage:
# $ bash run_discovery.sh
#     Please enter the organization name:
#     My-Super-Cool-ORG

if [ -z "$TOKEN" ]; then
  echo "Error: Please set the GitHub API token in the TOKEN environment variable."
  echo "Example: $ export TOKEN=ghp_****"
  exit 1
fi

echo "Please enter the organization name: "
read orgName

url="https://api.github.com/orgs/$orgName/repos?per_page=100"
repos=()
page=1

while true; do
  page_repos=$(curl -s -H "Authorization: token $TOKEN" "$url&page=$page" | jq -r '.[] | .name, .private')
  while IFS= read -r repo; do
    repos+=("$repo")
  done <<< "$page_repos"
  headers=$(curl -s -I -H "Authorization: token $TOKEN" "$url&page=$page")
  link_header=$(echo "$headers" | awk '/^link:/ {print $0}')
  echo ""
  echo "Discovering Repo Listing Standby: "
  echo ""
  echo $link_header
  if echo "$link_header" | grep -q 'rel="next"'; then
    next_page=1
  else
    next_page=0
  fi
  if [ "$next_page" -eq 0 ]; then
    break
  fi
  ((page++))
  sleep 6
  done
  arr=(${repos[@]})
  for ((i=0; i<${#arr[@]}; i+=2)); do
    visibility="Public"
    if [ "${arr[$i+1]}" = "true" ]; then
    visibility="Private"
    fi
    repoName="${arr[$i]}" 
    date=$(date +"%Y-%m-%d_%H-%M")
    filename="/tmp/${date}_discovered_repositories.tmp"
    echo "$repoName" >> $filename 
  done
echo ""
echo "Discovered Repository Count: $(cat $filename | wc -l)"
echo "Repo Listing located in: $filename"

For full inline notes please see here

0 replies

This comment was marked as off-topic.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GitHub Community

How To Use Pagination With GitHub's API #69826

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments

{{title}}

{{editor}}'s edit

{{editor}}'s edit

This comment was marked as off-topic.

Select a reply

GitHub Community

How To Use Pagination With GitHub's API #69826

loujr Oct 11, 2023

Using Cursor Based Pagination

Before and After

Using Page Based Pagination

Changing the number of items received

Conclusion

Replies: 2 comments

appatalks Jan 20, 2024

This comment was marked as off-topic.

loujr
Oct 11, 2023

appatalks
Jan 20, 2024