Skip to content

Commit 6955029

Browse files
authored
Update removing-sensitive-data-from-a-repository.md (#36095)
1 parent 104eece commit 6955029

File tree

1 file changed

+11
-11
lines changed

1 file changed

+11
-11
lines changed

content/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository.md

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,7 @@ shortTitle: Remove sensitive data
2020

2121
## About removing sensitive data from a repository
2222

23-
When altering your repository's history using tools like `git filter-repo`, it's crucial to understand the implications. Rewriting history requires careful coordination with collaborators to successfully execute, and has a number of side effects that must be managed.
23+
When altering your repository's history using tools like `git-filter-repo`, it's crucial to understand the implications. Rewriting history requires careful coordination with collaborators to successfully execute, and has a number of side effects that must be managed.
2424

2525
It is important to note that if the sensitive data you need to remove is a secret (e.g. password/token/credential), as is often the case, then as a first step you need to revoke and/or rotate that secret. Once the secret is revoked or rotated, it can no longer be used for access, and that may be sufficient to solve your problem. Going through the extra steps to rewrite the history and remove the secret may not be warranted.
2626

@@ -34,7 +34,7 @@ There are numerous side effects to rewriting history; these include:
3434
* **Branch protection challenges**: If you have any branch protections that prevent force pushes, those protections will have to be turned off (at least temporarily) for the sensitive data to be removed.
3535
* **Broken diff view for closed pull requests**: Removing the sensitive data will require removing the internal references used for displaying the diff view in pull requests, so you will no longer be able to see these diffs. This is true not only for the PR that introduced the sensitive data, but any PR that builds on a version of history after the sensitive data PR was merged (even if those later PRs didn't add or modify any file with sensitive data).
3636
* **Poor interaction with open pull requests**: Changed commit SHAs will result in a different PR diff, and comments on the old PR diff may become invalidated and lost, which may cause confusion for authors and reviewers. We recommend merging or closing all open pull requests before removing files from your repository.
37-
* **Lost signatures on commits and tags**: Signatures for commits or tags depend on commit hashes; since commit hashes are modified by history rewrites, signatures would no longer be valid and many history rewriting tools (including `git filter-repo`) will simply remove the signatures. In fact, `git filter-repo` will remove commit signatures and tag signatures for commits that pre-date the sensitive data removal as well. (Technically one can workaround this with the `--refs` option to `git filter-repo` if needed, but then you will need to be careful to ensure you specify all refs that have sensitive data in their history and that include the commits that introduced the sensitive data in your range).
37+
* **Lost signatures on commits and tags**: Signatures for commits or tags depend on commit hashes; since commit hashes are modified by history rewrites, signatures would no longer be valid and many history rewriting tools (including `git-filter-repo`) will simply remove the signatures. In fact, `git-filter-repo` will remove commit signatures and tag signatures for commits that pre-date the sensitive data removal as well. (Technically one can workaround this with the `--refs` option to `git-filter-repo` if needed, but then you will need to be careful to ensure you specify all refs that have sensitive data in their history and that include the commits that introduced the sensitive data in your range).
3838
* **Leading others directly to the sensitive data**: Git was designed with cryptographic checks built into commit identifiers so that nefarious individuals could not break into a server and modify history without being noticed. That's helpful from a security perspective, but from a sensitive data perspective it means that expunging sensitive data is a very involved process of coordination; it further means that when you do modify history, clueful users with an existing clone will notice the history divergence and can use it to quickly and easily find the sensitive data still in their clone that you removed from the central repository.
3939

4040
## About sensitive data exposure
@@ -52,7 +52,7 @@ If you only rewrite your history and force push it, the commits with sensitive d
5252
* Directly via their SHA-1 hashes in cached views on {% data variables.product.github %}
5353
* Through any pull requests that reference them
5454

55-
You cannot remove sensitive data from other users' clones of your repository; you will have to send them the instructions from [Make sure other copies are cleaned up: clones of colleagues](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#_make_sure_other_copies_are_cleaned_up_clones_of_colleagues) in the `git filter-repo` manual to have them do so themselves. However, you can permanently remove cached views and references to the sensitive data in pull requests on {% data variables.product.github %} by contacting {% data variables.contact.contact_support %}.
55+
You cannot remove sensitive data from other users' clones of your repository; you will have to send them the instructions from [Make sure other copies are cleaned up: clones of colleagues](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#_make_sure_other_copies_are_cleaned_up_clones_of_colleagues) in the `git-filter-repo` manual to have them do so themselves. However, you can permanently remove cached views and references to the sensitive data in pull requests on {% data variables.product.github %} by contacting {% data variables.contact.contact_support %}.
5656

5757
{% ifversion fpt or ghec %}
5858

@@ -66,7 +66,7 @@ Consider these limitations and challenges in your decision to rewrite your repos
6666

6767
## Purging a file from your local repository's history using git-filter-repo
6868

69-
1. Install the latest release of [the `git filter-repo` tool](https://github.com/newren/git-filter-repo). You need a version with the `--sensitive-data-removal` flag, meaning at least version 2.47. You can install `git filter-repo` manually or by using a package manager. For example, to install the tool with HomeBrew, use the `brew install` command.
69+
1. Install the latest release of [the `git-filter-repo` tool](https://github.com/newren/git-filter-repo). You need a version with the `--sensitive-data-removal` flag, meaning at least version 2.47. You can install `git-filter-repo` manually or by using a package manager. For example, to install the tool with HomeBrew, use the `brew install` command.
7070

7171
```shell
7272
brew install git-filter-repo
@@ -86,20 +86,20 @@ Consider these limitations and challenges in your decision to rewrite your repos
8686
cd YOUR-REPOSITORY
8787
```
8888

89-
1. Run a `git filter-repo` command to clean up the sensitive data.
89+
1. Run a `git-filter-repo` command to clean up the sensitive data.
9090

9191
If you want to delete a specific file from all branches/tags/refs, run the following command replacing `PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA` with the **git path to the file you want to remove, not just its filename** (e.g. `src/module/phone-numbers.txt`):
9292

9393
```shell
94-
git filter-repo --sensitive-data-removal --invert-paths --path PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA
94+
git-filter-repo --sensitive-data-removal --invert-paths --path PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA
9595
```
9696

9797
> [!IMPORTANT] If the file with sensitive data used to exist at any other paths (because it was moved or renamed), you must either add an extra `--path` argument for that file, or run this command a second time naming the alternative path.
9898

9999
If you want to replace all text listed in `../passwords.txt` from any non-binary files found anywhere in your repository's history, run the following command:
100100
101101
```shell
102-
git filter-repo --sensitive-data-removal --replace-text ../passwords.txt
102+
git-filter-repo --sensitive-data-removal --replace-text ../passwords.txt
103103
```
104104
105105
1. Double-check that you've removed everything you wanted to from your repository's history.
@@ -133,13 +133,13 @@ Consider these limitations and challenges in your decision to rewrite your repos
133133

134134
## Fully removing the data from {% data variables.product.github %}
135135

136-
After using `git filter-repo` to remove the sensitive data and pushing your changes to {% data variables.product.github %}, you must take a few more steps to fully remove the data from {% data variables.product.github %}.
136+
After using `git-filter-repo` to remove the sensitive data and pushing your changes to {% data variables.product.github %}, you must take a few more steps to fully remove the data from {% data variables.product.github %}.
137137

138138
1. Contact {% data variables.contact.contact_support %}, and provide the following information:
139139

140140
* The owner and repository name in question (e.g. YOUR-USERNAME/YOUR-REPOSITORY).
141141
* The number of affected pull requests, found in the previous step. This is used by Support to verify you understand how much will be affected.
142-
* The First Changed Commit(s) reported by `git filter-repo` (Look for `NOTE: First Changed Commit(s)` in its output.)
142+
* The First Changed Commit(s) reported by `git-filter-repo` (Look for `NOTE: First Changed Commit(s)` in its output.)
143143
* If `NOTE: There were LFS Objects Orphaned by this rewrite` appears in the git-filter-repo output (right after the First Changed Commit), then mention you had LFS Objects Orphaned and upload the named file to the ticket as well.
144144

145145
If you have successfully cleaned up all references other than PRs, and no forks have references to the sensitive data, Support will then:
@@ -152,7 +152,7 @@ After using `git filter-repo` to remove the sensitive data and pushing your chan
152152
{% ifversion ghes %}For more information about how site administrators can remove unreachable Git objects, see [AUTOTITLE](/admin/configuration/configuring-your-enterprise/command-line-utilities#ghe-repo-gc). For more information about how site administrators can identify reachable commits, see [Identifying reachable commits](#identifying-reachable-commits).{% endif %}{% ifversion fpt or ghec %}
153153
>[!IMPORTANT] {% data variables.contact.github_support %} won't remove non-sensitive data, and will only assist in the removal of sensitive data in cases where we determine that the risk can't be mitigated by rotating affected credentials.{% endif %}
154154

155-
1. Collaborators must [rebase](https://git-scm.com/book/en/v2/Git-Branching-Rebasing), _not_ merge, any branches they created off of your old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that you just went to the trouble of purging. They may need to take additional steps as well; see [Make sure other copies are cleaned up: clones of colleagues](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#_make_sure_other_copies_are_cleaned_up_clones_of_colleagues) in the `git filter-repo` manual.
155+
1. Collaborators must [rebase](https://git-scm.com/book/en/v2/Git-Branching-Rebasing), _not_ merge, any branches they created off of your old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that you just went to the trouble of purging. They may need to take additional steps as well; see [Make sure other copies are cleaned up: clones of colleagues](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#_make_sure_other_copies_are_cleaned_up_clones_of_colleagues) in the `git-filter-repo` manual.
156156

157157
{% ifversion ghes %}
158158

@@ -209,6 +209,6 @@ There are a few things you can do to avoid committing or pushing things that sho
209209
210210
## Further reading
211211
212-
* [`git filter-repo` man page](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html), especially the "Sensitive Data Removal" subsection of the "DISCUSSION" section.
212+
* [`git-filter-repo` man page](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html), especially the "Sensitive Data Removal" subsection of the "DISCUSSION" section.
213213
* [Pro Git: Git Tools - Rewriting History](https://git-scm.com/book/en/v2/Git-Tools-Rewriting-History)
214214
* [AUTOTITLE](/code-security/secret-scanning/introduction/about-secret-scanning)

0 commit comments

Comments
 (0)