Skip to content

Latest commit

 

History

History
506 lines (447 loc) · 26.2 KB

README.md

File metadata and controls

506 lines (447 loc) · 26.2 KB

Branch-Pruner | GitHub Action Workflow

CAUTION: IT IS A POWERFUL TOOL AND YOU USE IT AT YOUR OWN RISK. CUTS CAN'T BE UNDONE.


[ Workflow Readme == Action Readme ]

As Sacha Willems posted: "Shrinking a git(hub) repository isn’t just about deleting locally present files but requires cleaning up the history as files that have been removed are still present in the repository’s history and therefore still contribute to it’s size."

With the GitHub action Branch-Pruner, you can easily reduce the size of one/multiple GitHub repo(s) by manually and/or automatically truncating the old commit history of one/multiple selected branch(es). This means that you can delete all commits with previous and unused file versions up to an arbitrarily selected point in your Git history without losing newer commits with newer file versions of a selected branch tree.

Normally YOU SHOULD NEVER DO THIS and there are huge drawbacks. However, in some cases it is really useful to get rid of the old stuff on a regular basis. E. g., if your repository size is growing continuously and you only ever need the latest commit history. Or when you encounter problems of a general slowness with Git commands like push and pull. Then it's time for the Branch-Pruner. It will speed you up again 😉.


| Credits

I, Sitdisch, created the Branch-Pruner because I needed a GitHub action that would periodically auto-crop my repo size, and there was no action out there before. My solution approach is based on this blog post by Thomas Sutton and this blog post by Alin Ruscior. Thanks to both.

P.S. my Branch-Pruner Gif based on the Git Logo by Jason Long [License: CC BY 3.0] and the scissor icon from the googlefonts/noto-emoji repository [License: Apache-2.0].


| Drawbacks

The Branch-Pruner rewrites the entire commit history of the branch being pruned. The new history takes the branch-tree of the selected NEW-FIRST-COMMIT. That means all subsequent commits have the old order and be authored by the original sources.

But the Drawbacks are:

  • the files are marked as created in the NEW-FIRST-COMMIT
  • all commits have new time stamps and commit-hashes
  • all commits are committed by the selected User (default: github-actions[bot])
  • all forks and other branches have nothing to compare with the pruned branch anymore
  • cuts can't be undone.

| Setups

Oh, you're still here then let's do it. First, choose a workflow file:

Truncates the old commit history of the current main branch with minimal settings.

Set it up (click to toggle)

1. add the branch-pruner-easy.yml workflow file to a repository
  • get the file

  • it has to be the target repository where you want to prune the main branch (this is not the case with the other workflow files)

  • the path has to be .github/workflows/branch-pruner-easy.yml

2. create a new encrypted repository secret
  • see how to do this in general

  • give the secret a name e. g. BRANCH_PRUNER_TOKEN

  • the value of the secret must be the value of the Personal Access Token (PAT) for the repository where you want to prune the main branch

    • procedure for creating a PAT (fine-grained) or a PAT (classic)

    • select only the minimum scopes and permissions required

      • PAT (fine-grained): repository permissions

        • contents => access: read and write

        • metadata => access: read-only

      • PAT (classic): e. g. repo and workflow

    • CONSIDER: PAT expiration requires you to regenerate the PAT and set it as the secret's value again

  • add the secret to the same repository where you added this workflow file

3. adapt your branch-pruner-easy.yml file

 3.1 for manual triggers
 3.2 for all other triggers
  • adapt this section:

     ##############################################################
     # DEFINE YOUR INPUTS AND TRIGGERS IN THE FOLLOWING
     ##############################################################
    
     # INPUTS as environmental variables (env)
     env:
     	NEW_FIRST_COMMIT: # e.g. commit-hash or HEAD~N etc.
     	TOKEN_NAME: # target token name e.g. 'BRANCH_PRUNER_TOKEN'
     
     # TRIGGERS
     on:
     #	push:
     #	schedule:
     #		- cron: '00 23 28 * *'

    CONSIDER:

    • INPUTS:

      • you only have to define NEW_FIRST_COMMIT and TOKEN_NAME;

      • NEW-FIRST-COMMIT: choose it carefully; E. g., HEAD~N is really useful for autonomously truncating commits on a regular basis. However, know what you are doing. HEAD~N or HEAD^N may be not the commits you're targeting. For more information about HEAD~N and HEAD^N look e. g. here.

      • TOKEN_NAME: never enter the actual value of the personal access token

    • TRIGGERS:

      • schedule:
        • e. g. cron: '00 23 28 * *' executes the Branch-Pruner every 28th day of a month at 23:00

        • you can check your inputs here

    • hidden defaults (changeable with the other workflow files):

      • target repository & branch: repository with this workflow file and main branch

      • user settings:

        • user who commit: github-actions[bot]

        • user e-mail address: 41898282+github-actions[bot]@users.noreply.github.com

That's it. Happy pruning.

Truncates the old commit history of a selected target branch.

Set it up (click to toggle)

1. add the branch-pruner-default.yml workflow file to a repository
  • get the file

  • it doesn't have to be the repository you want to prune; e. g., you can simply fork the myactionway/branch-pruner-workflows repository

    • CONSIDER: with a forked repository, you need to confirm that you want to use a workflow before you can actually use it (repo menu > actions tab > push the button)
  • the path has to be .github/workflows/branch-pruner-default.yml

2. create a new encrypted repository secret
  • see how to do this in general

  • give the secret a name e. g. BRANCH_PRUNER_TOKEN

  • the value of the secret must be the value of the Personal Access Token (PAT) for the repository where you want to prune a branch

    • procedure for creating a PAT (fine-grained) or a PAT (classic)

    • select only the minimum scopes and permissions required

      • PAT (fine-grained): repository permissions

        • contents => access: read and write

        • metadata => access: read-only

      • PAT (classic): e. g. repo and workflow

    • CONSIDER: PAT expiration requires you to regenerate the PAT and set it as the secret's value again

  • add the secret to the same repository where you added this workflow file

3. adapt your branch-pruner-default.yml file

 3.1 for manual triggers
 3.2 for all other triggers
  • adapt this section:

     ##############################################################
     # DEFINE YOUR INPUTS AND TRIGGERS IN THE FOLLOWING
     ##############################################################
    
     # INPUTS as environmental variables (env)
     env:
     	NEW_FIRST_COMMIT: # e.g. commit-hash or HEAD~N etc.
     	TOKEN_NAME: # target token name e.g. 'BRANCH_PRUNER_TOKEN'
     	REPOSITORY: # target repository e.g. 'dummy/mytargetrepo'
     	BRANCH: # branch to be pruned e.g 'main'
     	USER_NAME: # user who should commit e.g. 'dummy'
     	USER_EMAIL: # e.g. 'dummy@gmail.com'
     
     # TRIGGERS
     on:
     #	push:
     #	schedule:
     #		- cron: '00 23 28 * *'

    CONSIDER:

    • INPUTS:

      • you only have to define NEW_FIRST_COMMIT and TOKEN_NAME; if any other input is blank, one of these default values will be used instead

         DEFAULT_REPOSITORY: ${{ github.repository }} # repo with this file
         DEFAULT_BRANCH: 'main'
         DEFAULT_USER_NAME: 'github-actions[bot]'
         DEFAULT_USER_EMAIL: '41898282+github-actions[bot]@users.noreply.github.com'
      • NEW-FIRST-COMMIT: choose it carefully; E. g., HEAD~N is really useful for autonomously truncating commits on a regular basis. However, know what you are doing. HEAD~N or HEAD^N may be not the commits you're targeting. For more information about HEAD~N and HEAD^N look e. g. here.

      • TOKEN_NAME: never enter the actual value of the personal access token

    • TRIGGERS:

      • schedule:
        • e. g. cron: '00 23 28 * *' executes the Branch-Pruner every 28th day of a month at 23:00

        • you can check your inputs here

That's it. Happy pruning.

Truncates the old commit history of multiple selected target branches.

Set it up (click to toggle)

1. add the branch-pruner-advanced.yml workflow file to a repository
  • get the file

  • it doesn't have to be a repository where you want to prune branches; e. g., you can simply fork the myactionway/branch-pruner-workflows repository

    • CONSIDER: with a forked repository, you need to confirm that you want to use a workflow before you can actually use it (repo menu > actions tab > push the button)
  • the path has to be .github/workflows/branch-pruner-advanced.yml

2. create new encrypted repository secrets
  • see how to do this in general

  • give the secrets names e. g. BRANCH_PRUNER_TOKEN_1 and BRANCH_PRUNER_TOKEN_2

  • the values of the secrets must be the values of the Personal Access Tokens (PAT) for the repositories where you want to prune branches

    • procedure for creating a PAT (fine-grained) or a PAT (classic)

    • select only the minimum scopes and permissions required

      • PAT (fine-grained): repository permissions

        • contents => access: read and write

        • metadata => access: read-only

      • PAT (classic): e. g. repo and workflow

    • CONSIDER: PAT expiration requires you to regenerate the PAT and set it as the secret's value again

  • add the secrets to the same repository where you added this workflow file

3. adapt your branch-pruner-advanced.yml file

 3.1 define your defaults
  • adapt this section:

     ##############################################################
     # DEFINE YOUR DEFAULTS (INPUTS & TRIGGERS) IN THE FOLLOWING
     ##############################################################
    
     # INPUTS as environmental variables (env)
     env:
     	TOKEN_NAME: # target token name e.g. 'BRANCH_PRUNER_TOKEN_1'
     	REPOSITORY: # target repository e.g. 'dummy/mytargetrepo_1'
     	USER_NAME: # user who should commit e.g. 'dummy'
     	USER_EMAIL: # e.g. 'dummy@gmail.com'
    
     # TRIGGERS
     on:
     #	push:
     #	schedule:
     #		- cron: '00 23 28 * *'
     	workflow_dispatch:

    CONSIDER:

 3.2 define your settings for the different target branches
  • adapt this section:

     ##############################################################
     # FIRST TARGET BRANCH | DEFINE YOUR ENV IN THE FOLLOWING
     ##############################################################
     -	NAME: 'Pruning Branch 1'
     	NEW_FIRST_COMMIT: 'HEAD~40'
     	BRANCH: 'main'
     #	TOKEN_NAME:
     #	REPOSITORY:
     #	USER_NAME:
     #	USER_EMAIL:
     ##############################################################
     # SECOND TARGET BRANCH | DEFINE YOUR ENV IN THE FOLLOWING
     ##############################################################
     -	NAME: 'Pruning Branch 2'
     	NEW_FIRST_COMMIT: 'HEAD^20'
     	BRANCH: 'dev'
     #	TOKEN_NAME: # e.g. 'BRANCH_PRUNER_TOKEN_2'
     #	REPOSITORY: # e.g. 'dummy/mytargetrepo_2'
     #	USER_NAME:
     #	USER_EMAIL:
     ##############################################################
     # THIRD TARGET BRANCH | FEEL FREE TO ADD MORE TARGET BRANCHES
     # ...

    CONSIDER:

    • you just have to define NAME, NEW_FIRST_COMMIT and BRANCH for each target branch; if you do not define any of the other inputs, your predefined defaults will be used instead

    • only a maximum of 256 target branches per workflow run is possible [GitHub restriction]

That's it. Happy pruning.


Warning: If you use your own workflow file, it is highly recommended to set a time limit for the job execution (GitHub's default: 6 hours); default in the proposed workflow files timeout-minutes: 8


| Known issues & possible solutions

The error "fatal: refusing to merge unrelated histories" occurs when you pull the pruned branch back to your local machine:
  • possible solution [source]:

    1. git fetch --all

    2. git reset --hard origin/<PRUNED_BRANCH> (replace <PRUNED_BRANCH>)

"Error: fatal: could not read Username for 'https://github.com': terminal prompts disabled":

"remote: Permission to ... denied to ... fatal: unable to access 'https://github.com/...': The requested URL returned error: 403":

  • your personal access token used does not have the minimum scopes/permissions required to prune a branch in your target repository

You get a failed job because it exceeded the maximum execution time:

  • increase timeout-minutes in your workflow file (default in the proposed workflow files = 8min)

  • if that doesn't help, it could be a general issue with GitHub Actions

The workflow logs do not provide enough detail to diagnose why a workflow, job, or step is not working as expected:

You are experiencing strange behavior from GitHub actions:

Your workflow trigger schedule doesn't fire:
  • in my experience, a workflow file with this trigger must be placed in the default branch

  • in this chat Brightran said: "... The workaround is to push something to trigger them. ..." and Hless said: "... It appears to me that it takes while before schedules actions run at all in a new repo". In my experience, they are right.


| Application example

For my @MyThemeWay Website-Boilerplates, I use the Lighthouse-Badger 🦡 🗼 🎖️ to update automatically my Lighthouse badges and reports once a week. Meanwhile, my repository size continues to grow.

To counter this, I use the Branch-Pruner once a month. That way, I have the repo size under control and also the ability to see the latest history of my badges and reports without the really old stuff.


| Appendix

Note on protected brand names and logos

  • The use of protected brand names, trade names, utility models and brand logos on this website does not constitute an infringement of copyright; rather, it serves as an illustrative note. Even if this is not marked as such at the respective points, the corresponding legal provisions always apply.

  • The brand names and logos used are the property of their respective owners and are subject to their copyright provisions.

  • This offer is in no way related to the legal entities of the protected brand names and logos used.

Note on liability for links

  • This README contains links to external third-party websites. The README operator has no influence on the content of these sites. Therefore, he cannot assume any liability. Instead, the respective provider is always responsible for the content.

  • The linked pages were checked for possible legal violations at the time of linking and illegal content wasn't discernible. A permanent control of the linked pages is unreasonable without concrete evidence of an infringement. However, if the README operator becomes aware of such a violation, he will act immediately.

Readme uses: