You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: content/authentication/keeping-your-account-and-data-secure/removing-sensitive-data-from-a-repository.md
+11-11Lines changed: 11 additions & 11 deletions
Original file line number
Diff line number
Diff line change
@@ -20,7 +20,7 @@ shortTitle: Remove sensitive data
20
20
21
21
## About removing sensitive data from a repository
22
22
23
-
When altering your repository's history using tools like `gitfilter-repo`, it's crucial to understand the implications. Rewriting history requires careful coordination with collaborators to successfully execute, and has a number of side effects that must be managed.
23
+
When altering your repository's history using tools like `git-filter-repo`, it's crucial to understand the implications. Rewriting history requires careful coordination with collaborators to successfully execute, and has a number of side effects that must be managed.
24
24
25
25
It is important to note that if the sensitive data you need to remove is a secret (e.g. password/token/credential), as is often the case, then as a first step you need to revoke and/or rotate that secret. Once the secret is revoked or rotated, it can no longer be used for access, and that may be sufficient to solve your problem. Going through the extra steps to rewrite the history and remove the secret may not be warranted.
26
26
@@ -34,7 +34,7 @@ There are numerous side effects to rewriting history; these include:
34
34
***Branch protection challenges**: If you have any branch protections that prevent force pushes, those protections will have to be turned off (at least temporarily) for the sensitive data to be removed.
35
35
***Broken diff view for closed pull requests**: Removing the sensitive data will require removing the internal references used for displaying the diff view in pull requests, so you will no longer be able to see these diffs. This is true not only for the PR that introduced the sensitive data, but any PR that builds on a version of history after the sensitive data PR was merged (even if those later PRs didn't add or modify any file with sensitive data).
36
36
***Poor interaction with open pull requests**: Changed commit SHAs will result in a different PR diff, and comments on the old PR diff may become invalidated and lost, which may cause confusion for authors and reviewers. We recommend merging or closing all open pull requests before removing files from your repository.
37
-
***Lost signatures on commits and tags**: Signatures for commits or tags depend on commit hashes; since commit hashes are modified by history rewrites, signatures would no longer be valid and many history rewriting tools (including `gitfilter-repo`) will simply remove the signatures. In fact, `gitfilter-repo` will remove commit signatures and tag signatures for commits that pre-date the sensitive data removal as well. (Technically one can workaround this with the `--refs` option to `gitfilter-repo` if needed, but then you will need to be careful to ensure you specify all refs that have sensitive data in their history and that include the commits that introduced the sensitive data in your range).
37
+
***Lost signatures on commits and tags**: Signatures for commits or tags depend on commit hashes; since commit hashes are modified by history rewrites, signatures would no longer be valid and many history rewriting tools (including `git-filter-repo`) will simply remove the signatures. In fact, `git-filter-repo` will remove commit signatures and tag signatures for commits that pre-date the sensitive data removal as well. (Technically one can workaround this with the `--refs` option to `git-filter-repo` if needed, but then you will need to be careful to ensure you specify all refs that have sensitive data in their history and that include the commits that introduced the sensitive data in your range).
38
38
***Leading others directly to the sensitive data**: Git was designed with cryptographic checks built into commit identifiers so that nefarious individuals could not break into a server and modify history without being noticed. That's helpful from a security perspective, but from a sensitive data perspective it means that expunging sensitive data is a very involved process of coordination; it further means that when you do modify history, clueful users with an existing clone will notice the history divergence and can use it to quickly and easily find the sensitive data still in their clone that you removed from the central repository.
39
39
40
40
## About sensitive data exposure
@@ -52,7 +52,7 @@ If you only rewrite your history and force push it, the commits with sensitive d
52
52
* Directly via their SHA-1 hashes in cached views on {% data variables.product.github %}
53
53
* Through any pull requests that reference them
54
54
55
-
You cannot remove sensitive data from other users' clones of your repository; you will have to send them the instructions from [Make sure other copies are cleaned up: clones of colleagues](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#_make_sure_other_copies_are_cleaned_up_clones_of_colleagues) in the `gitfilter-repo` manual to have them do so themselves. However, you can permanently remove cached views and references to the sensitive data in pull requests on {% data variables.product.github %} by contacting {% data variables.contact.contact_support %}.
55
+
You cannot remove sensitive data from other users' clones of your repository; you will have to send them the instructions from [Make sure other copies are cleaned up: clones of colleagues](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#_make_sure_other_copies_are_cleaned_up_clones_of_colleagues) in the `git-filter-repo` manual to have them do so themselves. However, you can permanently remove cached views and references to the sensitive data in pull requests on {% data variables.product.github %} by contacting {% data variables.contact.contact_support %}.
56
56
57
57
{% ifversion fpt or ghec %}
58
58
@@ -66,7 +66,7 @@ Consider these limitations and challenges in your decision to rewrite your repos
66
66
67
67
## Purging a file from your local repository's history using git-filter-repo
68
68
69
-
1. Install the latest release of [the `gitfilter-repo` tool](https://github.com/newren/git-filter-repo). You need a version with the `--sensitive-data-removal` flag, meaning at least version 2.47. You can install `gitfilter-repo` manually or by using a package manager. For example, to install the tool with HomeBrew, use the `brew install` command.
69
+
1. Install the latest release of [the `git-filter-repo` tool](https://github.com/newren/git-filter-repo). You need a version with the `--sensitive-data-removal` flag, meaning at least version 2.47. You can install `git-filter-repo` manually or by using a package manager. For example, to install the tool with HomeBrew, use the `brew install` command.
70
70
71
71
```shell
72
72
brew install git-filter-repo
@@ -86,20 +86,20 @@ Consider these limitations and challenges in your decision to rewrite your repos
86
86
cd YOUR-REPOSITORY
87
87
```
88
88
89
-
1. Run a `gitfilter-repo` command to clean up the sensitive data.
89
+
1. Run a `git-filter-repo` command to clean up the sensitive data.
90
90
91
91
If you want to delete a specific file from all branches/tags/refs, run the following command replacing `PATH-TO-YOUR-FILE-WITH-SENSITIVE-DATA` with the **git path to the file you want to remove, not just its filename** (e.g. `src/module/phone-numbers.txt`):
> [!IMPORTANT] If the file with sensitive data used to exist at any other paths (because it was moved or renamed), you must either add an extra `--path` argument for that file, or run this command a second time naming the alternative path.
98
98
99
99
If you want to replace all text listed in`../passwords.txt` from any non-binary files found anywhere in your repository's history, run the following command:
1. Double-check that you've removed everything you wanted to from your repository's history.
@@ -133,13 +133,13 @@ Consider these limitations and challenges in your decision to rewrite your repos
133
133
134
134
## Fully removing the data from {% data variables.product.github %}
135
135
136
-
After using `gitfilter-repo` to remove the sensitive data and pushing your changes to {% data variables.product.github %}, you must take a few more steps to fully remove the data from {% data variables.product.github %}.
136
+
After using `git-filter-repo` to remove the sensitive data and pushing your changes to {% data variables.product.github %}, you must take a few more steps to fully remove the data from {% data variables.product.github %}.
137
137
138
138
1. Contact {% data variables.contact.contact_support %}, and provide the following information:
139
139
140
140
* The owner and repository name in question (e.g. YOUR-USERNAME/YOUR-REPOSITORY).
141
141
* The number of affected pull requests, found in the previous step. This is used by Support to verify you understand how much will be affected.
142
-
* The First Changed Commit(s) reported by `gitfilter-repo` (Look for`NOTE: First Changed Commit(s)`in its output.)
142
+
* The First Changed Commit(s) reported by `git-filter-repo` (Look for`NOTE: First Changed Commit(s)`in its output.)
143
143
* If `NOTE: There were LFS Objects Orphaned by this rewrite` appears in the git-filter-repo output (right after the First Changed Commit), then mention you had LFS Objects Orphaned and upload the named file to the ticket as well.
144
144
145
145
If you have successfully cleaned up all references other than PRs, and no forks have references to the sensitive data, Support will then:
@@ -152,7 +152,7 @@ After using `git filter-repo` to remove the sensitive data and pushing your chan
152
152
{% ifversion ghes %}For more information about how site administrators can remove unreachable Git objects, see [AUTOTITLE](/admin/configuration/configuring-your-enterprise/command-line-utilities#ghe-repo-gc). For more information about how site administrators can identify reachable commits, see [Identifying reachable commits](#identifying-reachable-commits).{% endif %}{% ifversion fpt or ghec %}
153
153
>[!IMPORTANT] {% data variables.contact.github_support %} won't remove non-sensitive data, and will only assist in the removal of sensitive data in cases where we determine that the risk can't be mitigated by rotating affected credentials.{% endif %}
154
154
155
-
1. Collaborators must [rebase](https://git-scm.com/book/en/v2/Git-Branching-Rebasing), _not_ merge, any branches they created off of your old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that you just went to the trouble of purging. They may need to take additional steps as well; see [Make sure other copies are cleaned up: clones of colleagues](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#_make_sure_other_copies_are_cleaned_up_clones_of_colleagues) in the `gitfilter-repo` manual.
155
+
1. Collaborators must [rebase](https://git-scm.com/book/en/v2/Git-Branching-Rebasing), _not_ merge, any branches they created off of your old (tainted) repository history. One merge commit could reintroduce some or all of the tainted history that you just went to the trouble of purging. They may need to take additional steps as well; see [Make sure other copies are cleaned up: clones of colleagues](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html#_make_sure_other_copies_are_cleaned_up_clones_of_colleagues) in the `git-filter-repo` manual.
156
156
157
157
{% ifversion ghes %}
158
158
@@ -209,6 +209,6 @@ There are a few things you can do to avoid committing or pushing things that sho
209
209
210
210
## Further reading
211
211
212
-
* [`gitfilter-repo` man page](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html), especially the "Sensitive Data Removal" subsection of the "DISCUSSION" section.
212
+
* [`git-filter-repo` man page](https://htmlpreview.github.io/?https://github.com/newren/git-filter-repo/blob/docs/html/git-filter-repo.html), especially the "Sensitive Data Removal" subsection of the "DISCUSSION" section.
0 commit comments