-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #13 from kingdonb/test-prerelease
Testing link-checker-gpt 0.1.0-beta
- Loading branch information
Showing
29 changed files
with
403 additions
and
175 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
Dockerfile | ||
action.yml |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
name: Docker Image CI | ||
|
||
on: | ||
push: | ||
tags: | ||
- '*' | ||
|
||
jobs: | ||
build: | ||
runs-on: ubuntu-latest | ||
|
||
permissions: | ||
packages: write | ||
|
||
steps: | ||
- | ||
name: Checkout | ||
uses: actions/checkout@v3 | ||
- | ||
name: Set up Docker Buildx | ||
uses: docker/setup-buildx-action@v2 | ||
- | ||
name: Login to Docker Hub | ||
uses: docker/login-action@v2 | ||
with: | ||
registry: ghcr.io | ||
username: ${{ github.actor }} | ||
password: ${{ secrets.GITHUB_TOKEN }} | ||
- | ||
name: Build and push | ||
uses: docker/build-push-action@v4 | ||
with: | ||
context: . | ||
push: true | ||
tags: ghcr.io/kingdonb/link-checker-gpt:${{ github.ref_name }} | ||
cache-from: type=gha | ||
cache-to: type=gha,mode=max |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,27 @@ | ||
FROM ruby:3.0 | ||
|
||
# Install the gh cli (TODO: make the action comment on the PR) | ||
RUN curl -fsSL https://cli.github.com/packages/githubcli-archive-keyring.gpg | dd of=/usr/share/keyrings/githubcli-archive-keyring.gpg \ | ||
&& chmod go+r /usr/share/keyrings/githubcli-archive-keyring.gpg \ | ||
&& echo 'deb [signed-by=/usr/share/keyrings/githubcli-archive-keyring.gpg] https://cli.github.com/packages stable main' | tee /etc/apt/sources.list.d/github-cli.list > /dev/null && \ | ||
apt-get update && \ | ||
apt-get install -y gh | ||
|
||
# Do not: Set the working directory in the container | ||
# per https://docs.github.com/en/actions/creating-actions/dockerfile-support-for-github-actions#workdir | ||
# WORKDIR /linkchecker | ||
|
||
# Copy over your application | ||
WORKDIR /opt/link-checker | ||
COPY Gemfile Gemfile.lock /opt/link-checker | ||
|
||
# Install Ruby dependencies | ||
RUN gem install bundler -v 2.4.10 && bundle install | ||
|
||
COPY . /opt/link-checker/ | ||
|
||
# Copies your code file from your action repository to the filesystem path `/` of the container | ||
# COPY entrypoint.sh /entrypoint.sh | ||
|
||
# Executes `entrypoint.sh` when the Docker container starts up | ||
ENTRYPOINT ["/opt/link-checker/entrypoint.sh"] |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,31 +1,100 @@ | ||
# Link-Checker GPT | ||
|
||
This link checker is so-named because it was mostly written by ChatGPT. | ||
Welcome to the Link-Checker GPT! Crafted with the assistance of ChatGPT, this link checker ensures the integrity of links in your website's content. Although primarily designed for the FluxCD website's preview environments, it's versatile enough to work with most platforms, including Netlify. | ||
|
||
It is designed for use with the FluxCD website preview environments: | ||
## Integration as a CI Check | ||
|
||
```ruby | ||
Link-Checker GPT is ready to be integrated as a CI check within the fluxcd/website repository. When a PR check flags an error, it's an invitation to refine your links. An associated report is available as a downloadable CSV to guide the necessary corrections. In the future, our bot might also add a comment to your PR, providing a gentle nag that aims to cajole us into eventually reduce the number of bad links in the repo all the way down to zero. | ||
|
||
## Integration Guide for `fluxcd/website` | ||
|
||
Integrating the Link-Checker GPT into your existing workflow is straightforward. Here's how you can integrate it into the `fluxcd/website` repository: | ||
|
||
### Step 1: Add the Action | ||
|
||
In your `.github/workflows/` directory (create it if it doesn't exist), add a new workflow file, for instance, `link-check.yml`. You can also add this in an existing workflow. | ||
|
||
Within this file, add the following content: | ||
|
||
```yaml | ||
name: Link Checker | ||
|
||
on: | ||
pull_request: | ||
branches: | ||
- main | ||
|
||
jobs: | ||
check-links: | ||
runs-on: ubuntu-latest | ||
|
||
steps: | ||
- name: Checkout code | ||
uses: actions/checkout@v3 | ||
|
||
- name: Link Checker GPT | ||
uses: kingdonb/link-checker-gpt@v1-beta # (the v1 tag is still unreleased, we need to test) | ||
with: | ||
productionDomain: fluxcd.io | ||
previewDomain: deploy-preview-${{ github.event.pull_request.number }}--fluxcd.netlify.app | ||
prNumber: ${{ github.event.pull_request.number }} | ||
githubToken: ${{ github.token }} | ||
``` | ||
WIP - **TODO**: make this work for other consumers besides fluxcd.io - we have yet to test this on any other site. It should work anywhere that publishes a `sitemap.xml`, (which should be pretty much every important CMS including Jekyll, Hugo, Docsy, Bartholomew, ...) | ||
|
||
### Step 2: Configuration | ||
|
||
The required parameters are `productionDomain`, the target domain for production (to create a baseline report) and `previewDomain` the target domain for the PR's preview environment, by the convention this can usually be inferred from the PR number. This is the preview URL for the link checker. | ||
|
||
Both domains must create a sitemap.xml and populate it. | ||
|
||
### Step 3: Commit and Test | ||
|
||
Commit the new workflow file and create a new pull request. The Link Checker GPT action should automatically run and validate the links within the website content associated with the PR. | ||
|
||
If there are any bad links in the production site, they will be captured in a baseline report for follow-up later. Those links are not counted against a PR. If there are any new bad links in the PR then the check will fail. | ||
|
||
(Create a link to an invalid anchor in your PR to test this works, then revert the change before merging it!) | ||
|
||
## How it Works | ||
|
||
Familiarize yourself with the moving parts in a local clone. This action is Dockerized, but it was not designed to run in Docker, it is a Ruby program and can run on your local workstation. Just run `bundle install` first, then type `make`! | ||
|
||
(You will run against PR#1573 but in case you want to use a different PR to check for problems, you can just edit the Makefile, or keep reading to learn how to use this as a GitHub Action.) | ||
|
||
To check the links of a preview environment on Netlify, simply run: | ||
|
||
```bash | ||
ruby main.rb deploy-preview-1573--fluxcd.netlify.app | ||
``` | ||
|
||
It may behave differently when run against `fluxcd.io` and the preview site, | ||
but any differences are bugs. We either fix it here, or we fix the reason in | ||
the website itself (probably by replacing an absolute link with a hard domain | ||
reference to fluxcd.io in it.) | ||
This checks for bad links in your PR. But this is only half a check. We don't want you to get blamed for bad links that already were on the site, just because you opened a PR. | ||
|
||
So the tool needs to check `fluxcd.io` first, count up those bad links, then discount them from the PR so we can get a valid check output. This way we should guarantee that no new PR ever adds bad links to the FluxCD.io website. Any discrepancies between the reports are considered bugs—either they represent an error in this tool or they can be addressed directly in the website by modifying the links. | ||
|
||
There is a baseline report as well as a pr review report that tell what bad links are found, whether they are pre-existing on the site or created by your PR. Those pre-existing ones should be fixed eventually, as well, but they will not count against your PR. | ||
|
||
Upon successful execution one single time, a report detailing the link statuses is generated in `report.csv`. You can import this CSV into tools like Google Drive for further analysis and action. The `make summary` process takes the normalized output of the above described two checks, and it returns an error from the `check_summary.sh` script if the build should pass or fail. | ||
|
||
## Note on UX: Report Download | ||
|
||
In the event of a PR check failure, you can read the report in the failed job output. Initially this workflow was designed to enable the user to access a detailed report in the form of a zipped CSV. This was originally built as a composite workflow, you can still find remnants of this in the commented section of `action.yml`. | ||
|
||
Instead, the report now goes out to the workflow/action job log. You can read all the bad links created by your PR there. Any links from the baseline site will not be included in the report unless your PR is spotless. A later version might emit the baseline report when there is no issue created by the PR, to encourage tidying. Then the report will show the baseline issues, but since it was not caused by your PR they will not fail the report. | ||
|
||
The primary goal is to maximize the signal to noise ratio and prevent the users from desiring to uninstall this workflow. It should be easy to adopt, and it should never fail the workflow to nag the contributor about issues that their PR didn't create. | ||
|
||
**TODO**: We will still figure out a way to expose those baseline errors yet. | ||
|
||
Assuming it runs to completion, it will produce a report in report.csv | ||
## Cache Management | ||
|
||
I can import this report into Google Drive and mark it up as I fix the links. | ||
The tool incorporates caching initially intended to expedite repeated runs. This could be particularly useful for iterative development. Most runtime errors, especially those from the validate method and anchor checkers, can be debugged efficiently using cached data without re-fetching anything. | ||
|
||
This nearly works as a CI check, but we will need to fix many of the links | ||
first, and find a way to make exceptions for any more that cannot be fixed. | ||
However, there's a known issue: the cache isn't always reliable. To ensure accuracy, always run `make clean-cache` between separate executions. The cache is still used to prevent repeated calls out and to avoid the repeated loading of HTML files into memory. As a result, a lot of memory can be used. | ||
|
||
### Broken feature: Sitemap Caching | ||
**TODO**: We're considering refining the cache management system. The cache should always be invalidated unless its validity is assured. This feature's primary purpose is for one-time use and might be phased out or redesigned in future versions. | ||
|
||
There is a cache, so if you have run the script before the "Visiting links" | ||
step will not be repeated unless you run `make clean` first. This is to help | ||
with iterative development, since most of the runtime errors come from the | ||
validate method and anchor checker, they can be debugged easily from a cache. | ||
The primary issue to grapple now is that we can wait for the preview environment's deploy to become ready once, but cannot guarantee that subsequent runs of the checker are always looking at the latest version. There is no synchronization or coordination between independent jobs, and there is no job configuration for the Netlify preview build (not even sure how this works - it is an externally provided action.) | ||
|
||
However, it doesn't work. So make sure if you are running this more than one | ||
time, you always run at least `make clean-cache` between separate executions. | ||
Perhaps we can read the check statuses and wait to proceed with the scan of the preview domain until the Netlify deploy check shows itself as ready. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
name: 'Link Checker Action' | ||
description: 'Checks the integrity of links in the PR' | ||
inputs: | ||
# TODO: make this action comment on the PR | ||
# token: | ||
# description: 'GitHub Token' | ||
# required: true | ||
# TODO: take a preview URL as input instead | ||
prNumber: | ||
description: 'Pull Request Number' | ||
required: true | ||
productionDomain: | ||
description: 'Live production site hostname' | ||
required: true | ||
previewDomain: | ||
description: 'Preview site deployment hostname' | ||
required: true | ||
githubToken: | ||
description: 'The gh cli checks preview build deploy status' | ||
required: true | ||
outputs: | ||
pr-summary: | ||
description: 'Summary CSV for problematic links' | ||
baseline-unresolved: | ||
description: 'Baseline unresolved links CSV' | ||
runs: | ||
using: 'docker' | ||
image: 'docker://ghcr.io/kingdonb/link-checker-gpt:v1-beta' |
Oops, something went wrong.