-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DO NOT MERGE] Rebase to v2.47.1 #710
Conversation
mjcheetham
commented
Dec 3, 2024
In ac8acb4 (sparse-index: complete partial expansion, 2022-05-23), 'expand_index()' was updated to expand the index to a given pathspec. However, the 'path_matches_pattern_list()' method used to facilitate this has the side effect of initializing or updating the index hash variables ('name_hash', 'dir_hash', and 'name_hash_initialized'). This operation is performed on 'istate', though, not 'full'; as a result, the initialized hashes are later overwritten when copied from 'full'. To ensure the correct hashes are in 'istate' after the index expansion, change the arg used in 'path_matches_pattern_list()' from 'istate' to 'full'. Note that this does not fully solve the problem. If 'istate' does not have an initialized 'name_hash' when its contents are copied to 'full', initialized hashes will be copied back into 'istate' but 'name_hash_initialized' will be 0. Therefore, we also need to copy 'full->name_hash_initialized' back to 'istate' after the index expansion is complete. Signed-off-by: Victoria Dye <vdye@github.com>
Add test case to demonstrate that `git index-pack -o <idx-path> pack-path` fails if <idx-path> does not end in ".idx" when `--rev-index` is enabled. In e37d0b8 (builtin/index-pack.c: write reverse indexes, 2021-01-25) we learned to create `.rev` reverse indexes in addition to `.idx` index files. The `.rev` file pathname is constructed by replacing the suffix on the `.idx` file. The code assumes a hard-coded "idx" suffix. In a8dd7e0 (config: enable `pack.writeReverseIndex` by default, 2023-04-12) reverse indexes were enabled by default. If the `-o <idx-path>` argument is used, the index file may have a different suffix. This causes an error when it tries to create the reverse index pathname. The test here demonstrates the failure. (The test forces `--rev-index` to avoid interaction with `GIT_TEST_NO_WRITE_REV_INDEX` during CI runs.) Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
These seem to be custom tests to microsoft/git as they break without these changes, but these changes are not needed upstream. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Add a test verifying that sparse-checkout (with and without sparse index enabled) treat untracked files & directories correctly when changing sparse patterns. Specifically, it ensures that 'git sparse-checkout set' * deletes empty directories outside the sparse cone * does _not_ delete untracked files outside the sparse cone Signed-off-by: Victoria Dye <vdye@github.com>
Calculate the number of symrefs, loose vs packed, and the maximal/accumulated length of local vs remote branches. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com> Signed-off-by: Johannes Schindelin <Johannes.Schindelin@gmx.de>
With this commit, we gather statistics about the sizes of commits, trees, and blobs in the repository, and then present them in the form of "hexbins", i.e. log(16) histograms that show how many objects fall into the 0..15 bytes range, the 16..255 range, the 256..4095 range, etc. For commits, we also show the total count grouped by the number of parents, and for trees we additionally show the total count grouped by number of entries in the form of "qbins", i.e. log(4) histograms. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com> Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Teach index-pack to silently omit the reverse index if the index file does not have the standard ".idx" suffix. In e37d0b8 (builtin/index-pack.c: write reverse indexes, 2021-01-25) we learned to create `.rev` reverse indexes in addition to `.idx` index files. The `.rev` file pathname is constructed by replacing the suffix on the `.idx` file. The code assumes a hard-coded "idx" suffix. In a8dd7e0 (config: enable `pack.writeReverseIndex` by default, 2023-04-12) reverse indexes were enabled by default. If the `-o <idx-path>` argument is used, the index file may have a different suffix. This causes an error when it tries to create the reverse index pathname. Since we do not know why the user requested a non-standard suffix for the index, we cannot guess what the proper corresponding suffix should be for the reverse index. So we disable it. The t5300 test has been updated to verify that we no longer error out and that the .rev file is not created. TODO We could warn the user that we skipped it (perhaps only if they TODO explicitly requested `--rev-index` on the command line). TODO TODO Ideally, we should add an `--rev-index-path=<path>` argument TODO or change `--rev-index` to take a pathname. TODO TODO I'll leave these questions for a future series. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Prefetch the value of GIT_TRACE2_DST_DEBUG during startup and before we try to open any Trace2 destination pathnames. Normally, Trace2 always silently fails if a destination target cannot be opened so that it doesn't affect the execution of a Git command. The command should run normally, but just not generate any trace data. This can make it difficult to debug a telemetry setup, since the user doesn't know why telemetry isn't being generated. If the environment variable GIT_TRACE2_DST_DEBUG is true, the Trace2 startup will print a warning message with the `errno` to make debugging easier. However, on Windows, looking up the env variable resets `errno` so the warning message always ends with `...tracing: No error` which is not very helpful. Prefetch the env variable at startup. This avoids the need to update each call-site to capture `errno` in the usual `saved-errno` variable. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Create `struct large_item` and `struct large_item_vec` to capture the n largest commits, trees, and blobs under various scaling dimensions, such as size in bytes, number of commit parents, or number of entries in a tree. Each of these have a command line option to set them independently. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Include the pathname of each blob or tree in the large_item_vec to help identify the file or directory associated with the OID and size information. This pathname is computed during the path walk, so it reflects the first observed pathname seen for that OID during the traversal over all of the refs. Since the file or directory could have moved (without being modified), there may be multiple "correct" pathnames for a particular OID. Since we do not control the ref traversal order, we should consider it to be a "suggested pathname" for the OID. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Computing `git name-rev` on each commit, tree, and blob in each of the various large_item_vec can be very expensive if there are too many refs, especially if the user doesn't need the result. Lets make it optional. The `--no-name-rev` option can save 50 calls to `git name-rev` since we have 5 large_item_vec's and each defaults to 10 items. Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
Signed-off-by: Jeff Hostetler <jeffhostetler@github.com>
This backports the `ds/advice-sparse-index-expansion` patches into `microsoft/git` which _just_ missed the v2.46.0 window. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This topic branch brings in a new, experimental built-in command to assess the dimensions of a local repository. It is experimental and subject to change! It might grow new options, change its output, or even be moved into `git diagnose --analyze` or something like that. The hope is that this command, which was inspired by `git sizer` (https://github.com/github/git-sizer), will be helpful not only in diagnosing issues with large repositories, but also in modeling what shapes and sizes of repositories can be handled by Git (and as a corollary: where Git needs to improve to be able to accommodate the natural growth of repositories). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Cherry-pick rev-index fixes from v2.41.0.vfs.0.5 into v2.42.0.*
Prefetch the value of GIT_TRACE2_DST_DEBUG during startup and before we try to open any Trace2 destination pathnames. Normally, Trace2 always silently fails if a destination target cannot be opened so that it doesn't affect the execution of a Git command. The command should run normally, but just not generate any trace data. This can make it difficult to debug a telemetry setup, since the user doesn't know why telemetry isn't being generated. If the environment variable GIT_TRACE2_DST_DEBUG is true, the Trace2 startup will print a warning message with the `errno` to make debugging easier. However, on Windows, looking up the env variable resets `errno` so the warning message always ends with `...tracing: No error` which is not very helpful. Prefetch the env variable at startup. This avoids the need to update each call-site to capture `errno` in the usual `saved-errno` variable.
While using the reset --stdin feature on windows path added may have a \r at the end of the path that wasn't getting removed so didn't match the path in the index and wasn't reset. Signed-off-by: Kevin Willford <kewillf@microsoft.com>
…sitories (#667) This command is inspired by [`git sizer`](https://github.com/github/git-sizer), having the advantage of being much closer to the internals of Git. The intention is to provide a built-in command that can be used to analyze large repositories for performance and scaling problems, for growth over time, and to correlate with other measurements (in particular with Trace2 data collected e.g. via https://github.com/git-ecosystem/trace2receiver/).
It has been a long-standing practice in Git for Windows to append `.windows.<n>`, and in microsoft/git to append `.vfs.0.0`. Let's keep doing that. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
Since we really want to be based on a `.vfs.*` tag, let's make sure that there was a new-enough one, i.e. one that agrees with the first three version numbers of the recorded default version. This prevents e.g. v2.22.0.vfs.0.<some-huge-number>.<commit> from being used when the current release train was not yet tagged. It is important to get the first three numbers of the version right because e.g. Scalar makes decisions depending on those (such as assuming that the `git maintenance` built-in is not available, even though it actually _is_ available). Signed-off-by: Johannes Schindelin <johasc@microsoft.com>
This header file will accumulate GVFS-specific definitions. Signed-off-by: Kevin Willford <kewillf@microsoft.com>
This does not do anything yet. The next patches will add various values for that config setting that correspond to the various features offered/required by GVFS. Signed-off-by: Kevin Willford <kewillf@microsoft.com> gvfs: refactor loading the core.gvfs config value This code change makes sure that the config value for core_gvfs is always loaded before checking it. Signed-off-by: Kevin Willford <kewillf@microsoft.com>
This takes a substantial amount of time, and if the user is reasonably sure that the files' integrity is not compromised, that time can be saved. Git no longer verifies the SHA-1 by default, anyway. Signed-off-by: Kevin Willford <kewillf@microsoft.com> Update for 2023-02-27: This feature was upstreamed as the index.skipHash config option. This resulted in some changes to the struct and some of the setup code. In particular, the config reading was moved to prepare_repo_settings(), so the core.gvfs bit check was moved there, too. Signed-off-by: Derrick Stolee <derrickstolee@github.com>
Signed-off-by: Kevin Willford <kewillf@microsoft.com>
Prevent the sparse checkout to delete files that were marked with skip-worktree bit and are not in the sparse-checkout file. This is because everything with the skip-worktree bit turned on is being virtualized and will be removed with the change of HEAD. There was only one failing test when running with these changes that was checking to make sure the worktree narrows on checkout which was expected since we would no longer be narrowing the worktree. Update 2022-04-05: temporarily set 'sparse.expectfilesoutsideofpatterns' in test (until we start disabling the "remove present-despite-SKIP_WORKTREE" behavior with 'core.virtualfilesystem' in a later commit). Signed-off-by: Kevin Willford <kewillf@microsoft.com>
I will intend to send this upstream after the 2.47.0 release cycle, but this should get to our microsoft/git users for maximum impact. Customers have been struggling with explaining why the sparse index expansion advice message is showing up. The advice to run 'git clean' has not always helped folks, and sometimes it is very unclear why we are running into trouble. These changes introduce a way to log a reason for the expansion into the trace2 logs so it can be found by requesting that a user enable tracing. While testing this, I created the most standard case that happens, which is to have an existing directory match a sparse directory in the index. In this case, it showed that two log messages were required. See the last commit for this new log message. Together, these two places show this kind of message in the `GIT_TRACE2_PERF` output (trimmed for clarity): ``` region_enter | index | label:clear_skip_worktree_from_present_files_sparse data | sparse-index | ..skip-worktree sparsedir:<my-sparse-path>/ data | index | ..sparse_path_count:362 data | index | ..sparse_lstat_count:732 region_leave | index | label:clear_skip_worktree_from_present_files_sparse data | sparse-index | expansion-reason:failed to clear skip-worktree while sparse ``` I added some tests to demonstrate that these logs are recorded, but it also seems difficult to hit some of these cases.
Signed-off-by: Derrick Stolee <stolee@gmail.com>
Signed-off-by: Derrick Stolee <stolee@gmail.com>
An internal customer reported a segfault when running `git sparse-checkout set` with the `index.sparse` config enabled. I was unable to reproduce it locally, but with their help we debugged into the failing process and discovered the following stacktrace: ``` #0 0x00007ff6318fb7b0 in rehash (map=0x3dfb00d0440, newsize=1048576) at hashmap.c:125 #1 0x00007ff6318fbc66 in hashmap_add (map=0x3dfb00d0440, entry=0x3dfb5c58bc8) at hashmap.c:247 #2 0x00007ff631937a70 in hash_index_entry (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:122 #3 0x00007ff631938a2f in add_name_hash (istate=0x3dfb00d0400, ce=0x3dfb5c58bc8) at name-hash.c:638 #4 0x00007ff631a064de in set_index_entry (istate=0x3dfb00d0400, nr=8291, ce=0x3dfb5c58bc8) at sparse-index.c:255 #5 0x00007ff631a06692 in add_path_to_index (oid=0x5ff130, base=0x5ff580, path=0x3dfb4b725da "<redacted>", mode=33188, context=0x5ff570) at sparse-index.c:307 #6 0x00007ff631a3b48c in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41f60, base=0x5ff580, depth=2, pathspec=0x5ff5a0, fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:46 #7 0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41e80, base=0x5ff580, depth=1, pathspec=0x5ff5a0, fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80 #8 0x00007ff631a3b60b in read_tree_at (r=0x7ff631c026a0 <the_repo>, tree=0x3dfb5b41ac8, base=0x5ff580, depth=0, pathspec=0x5ff5a0, fn=0x7ff631a064e5 <add_path_to_index>, context=0x5ff570) at tree.c:80 #9 0x00007ff631a06a95 in expand_index (istate=0x3dfb00d0100, pl=0x0) at sparse-index.c:422 #10 0x00007ff631a06cbd in ensure_full_index (istate=0x3dfb00d0100) at sparse-index.c:456 #11 0x00007ff631990d08 in index_name_stage_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21, stage=0, search_mode=EXPAND_SPARSE) at read-cache.c:556 #12 0x00007ff631990d6c in index_name_pos (istate=0x3dfb00d0100, name=0x3dfb0020080 "algorithm/levenshtein", namelen=21) at read-cache.c:566 #13 0x00007ff63180dbb5 in sanitize_paths (argc=185, argv=0x3dfb0030018, prefix=0x0, skip_checks=0) at builtin/sparse-checkout.c:756 #14 0x00007ff63180de50 in sparse_checkout_set (argc=185, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:860 #15 0x00007ff63180e6c5 in cmd_sparse_checkout (argc=186, argv=0x3dfb0030018, prefix=0x0) at builtin/sparse-checkout.c:1063 #16 0x00007ff6317234cb in run_builtin (p=0x7ff631ad9b38 <commands+2808>, argc=187, argv=0x3dfb0030018) at git.c:548 #17 0x00007ff6317239c0 in handle_builtin (argc=187, argv=0x3dfb0030018) at git.c:808 #18 0x00007ff631723c7d in run_argv (argcp=0x5ffdd0, argv=0x5ffd78) at git.c:877 #19 0x00007ff6317241d1 in cmd_main (argc=187, argv=0x3dfb0030018) at git.c:1017 #20 0x00007ff631838b60 in main (argc=190, argv=0x3dfb0030000) at common-main.c:64 ``` The very bottom of the stack being the `rehash()` method from `hashmap.c` as called within the `name-hash` API made me look at where these hashmaps were being used in the sparse index logic. These were being copied across indexes, which seems dangerous. Indeed, clearing these hashmaps and setting them as not initialized fixes the segfault. The second commit is a response to a test failure that happens in `t1092-sparse-checkout-compatibility.sh` where `git stash pop` starts to fail because the underlying `git checkout-index` process fails due to colliding files. Passing the `-f` flag appears to work, but it's unclear why this name-hash change causes that change in behavior.
These two tests in t5616-partial-clone.sh are actually already broken and there are comments supporting that. Those comments were focused on the GIT_TEST_FULL_NAME_HASH variable, but they also apply to this one. We will want to avoid issues here. Signed-off-by: Derrick Stolee <stolee@gmail.com>
In preparation for allowing both the --shallow and --path-walk options in the 'git pack-objects' builtin, create a new 'edge_aggressive' option in the path-walk API. This option will help walk the boundary more thoroughly and help avoid sending extra objects during fetches and pushes. The only use of the 'edge_hint_aggressive' option in the revision API is within mark_edges_uninteresting(), which is usually called before between prepare_revision_walk() and before visiting commits with get_revision(). In prepare_revision_walk(), the UNINTERESTING commits are walked until a boundary is found. We didn't use this in the past because we would mark objects UNINTERESTING after doing the initial commit walk to the boundary. While we should be marking these objects as UNINTERESTING, we shouldn't _emit_ them all via the path-walk algorithm or else our delta calculations will get really slow. Based on these observations, the way we were handling the UNINTERESTING flag in walk_objects_by_path() was overly complicated and buggy. A lot of it can be removed and simplified to work with this new approach. It also means that we will see the UNINTERESTING boundaries of paths when doing a default path-walk call, changing some existing test cases. Signed-off-by: Derrick Stolee <stolee@gmail.com>
There does not appear to be anything particularly incompatible about the --shallow and --path-walk options of 'git pack-objects'. If shallow commits are to be handled differently, then it is by the revision walk that defines the commit set and which are interesting or uninteresting. However, before the previous change, a trivial removal of the warning would cause a failure in t5500-fetch-pack.sh when GIT_TEST_PACK_PATH_WALK is enabled. The shallow fetch would provide more objects than we desired, due to some incorrect behavior of the path-walk API, especially around walking uninteresting objects. To also cover the symmetrical case of pushing from a shallow clone, add a new test to t5538-push-shallow.sh that confirms the correct behavior of pushing only the new object. This works to validate both the --path-walk and --no-path-walk case when toggling the GIT_TEST_PACK_PATH_WALK environment variable. This test would have failed in the --path-walk case if we created it before the previous change. Signed-off-by: Derrick Stolee <stolee@gmail.com>
It can be notoriously difficult to detect if delta bases are being computed properly during 'git push'. Construct an example where it will make a kilobyte worth of difference when a delta base is not found. We can then use the progress indicators to distinguish between bytes and KiB depending on whether the delta base is found and used. Signed-off-by: Derrick Stolee <stolee@gmail.com>
Instead of storing the Personal Access Token in an environment secret, store it in Azure KeyVault instead. This allows for much better auditing when (and where) the secret is used. Ideally, we would even switch away from using a Personal Access Token in the first place. But there is no alternative, such as a Managed Identity on GitHub, where one could define in a fine-grained way which usage scenario can be performed using that identity, and recent reorgs at GitHub suggest that adding such an alternative may not be on the list of priorities at all. So let's just stay with a Personal Access Token, but do safeguard it better by putting it into a KeyVault that can only be accessed by a narrowly-scoped GitHub Actions environment. Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
This pull request aims to correct a pretty big issue when dealing with UNINTERESTING objects in the path-walk API. They somehow were only exposed when trying to perform a push from a shallow clone. This will require rewriting the upstream version so this is avoided from the start, but we can do a forward fix for now. The key issue is that the path-walk API was not walking UNINTERESTING trees at the right time, and the way it was being done was more complicated than it needed to be. This changes some of the way the path-walk API works in the presence of UNINTERSTING commits, but these are good changes to make. I had briefly attempted to remove the use of the `edge_aggressive` option in `struct path_walk_info` in favor of using the `--objects-edge-aggressive` option in the revision struct. When I started down that road, though, I somehow got myself into a bind of things not working correctly. I backed out to this version that is working with our test cases. I tested this using the thin and big pack tests in `p5313` which had the same performance as before this change. The new change is that in a shallow clone we can get the same `git push` improvements. I was hung up on testing this for a long time as I wasn't getting the same results in my shallow clone as in my regular clones. It turns out that I had forgotten to use `--no-reuse-delta` in my test command, so it was picking the deltas that were given by the initial clone instead of picking new ones per the algorithm. 🤦🏻
If the main window is already visible when gitk waits for it to become visible, gitk hangs forever. This commit adds a check whether the window is already visible. See https://wiki.tcl-lang.org/page/tkwait+visibility Signed-off-by: Tobias Pietzsch <pietzsch@mycroft.speedport.ip>
Instead of storing the Personal Access Token in an environment secret, store it in Azure KeyVault instead. This allows for much better auditing when (and where) the secret is used. Ideally, we would even switch away from using a Personal Access Token in the first place. But there is no alternative, such as a Managed Identity on GitHub, where one could define in a fine-grained way which usage scenario can be performed using that identity, and recent reorgs at GitHub suggest that adding such an alternative may not be on the list of priorities at all. So let's just stay with a Personal Access Token, but do safeguard it better by putting it into a KeyVault that can only be accessed by a narrowly-scoped GitHub Actions environment.
If the main window is already visible when gitk waits for it to become visible, gitk hangs forever. This commit adds a check whether the window is already visible. See https://wiki.tcl-lang.org/page/tkwait+visibility This ports git#944 to `microsoft/git`. for Git. Fixed issue #704.
These are actually (OID-)identical to the commits that were merged upstream.
This is needed because 9c93ba4 introduced
A subsequent rebase should squash this into the intended commit (I had forgotten to mark f6d195c as a Other than that, looks very good! |
The Git Fundamentals team is no more, and hasn't been for almost a year. The current owner is the GitClient team (continuity is ensured, though, by virtue of the same people taking care of `microsoft/git` as before). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The Git Fundamentals team is no more, and hasn't been for almost a year. The current owner is the GitClient team (continuity is ensured, though, by virtue of the same people taking care of `microsoft/git` as before). Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de>
The Debian package contains a mandatory `Maintainer` entry; Previously, the Git Fundamentals team was mentioned there. However, the Git Fundamentals team is no more, and hasn't been for almost a year. The current owner is the GitClient team (continuity is ensured, though, by virtue of the same people taking care of `microsoft/git` as before). Let's adjust things accordingly.