-
Notifications
You must be signed in to change notification settings - Fork 138
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fast-import: avoid making replace refs point to themselves #1824
fast-import: avoid making replace refs point to themselves #1824
Conversation
/submit |
Submitted as pull.1824.git.1731610074707.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Junio C Hamano wrote (reply to this): "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> Most git commands that you try to run in such a repository with a
> self-pointing replace object will result in an error:
>
> $ git log
> fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264
After reading up to this point, with "Most git commands", I was
afraid that you were going to say "... but I found this command that
fails to stop gracefully, and instead infinitely spin".
If that is not a case, then I am happy ;-) but in that case probably
"Most" -> "All" is warranted.
> Avoid such problems by deleting replace refs that will simply end up
> pointing to themselves at the end of our writing. Warn the users when
> we do so, unless they specify --quiet.
This may need a bit of rephrasing.
Even when they specify "--quiet", you'd refrain from creating a
self-referencing replace entry, which makes sense, but it was not
clear with only the above description. I had to read the patch text
to find it out.
Is it reasonable to expect that a self referencing replace entry is
the most common thing to happen, or loops with more than one
elements are equally plausible to happen but the self-referencing
one is singled out by this patch because it is trivial to notice,
unlike other forms of equally problematic loops?
> diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> index 76d5c20f141..51c8228cb7b 100644
> --- a/builtin/fast-import.c
> +++ b/builtin/fast-import.c
> @@ -179,6 +179,7 @@ static unsigned long branch_load_count;
> static int failure;
> static FILE *pack_edges;
> static unsigned int show_stats = 1;
> +static unsigned int quiet;
> static int global_argc;
> static const char **global_argv;
> static const char *global_prefix;
> @@ -1602,7 +1603,19 @@ static int update_branch(struct branch *b)
> struct ref_transaction *transaction;
> struct object_id old_oid;
> struct strbuf err = STRBUF_INIT;
> -
> + static const char *replace_prefix = "refs/replace/";
> +
> + if (starts_with(b->name, replace_prefix) &&
Curious how the "diff" machinery decided to represent the hunk like
this, instead of say
> struct strbuf err = STRBUF_INIT;
> + static const char *replace_prefix = "refs/replace/";
>
> + if (starts_with(b->name, replace_prefix) &&
Of course that has nothing to do with this patch or its review.
> + !strcmp(b->name + strlen(replace_prefix),
> + oid_to_hex(&b->oid))) {
> + if (!quiet)
> + warning("Dropping %s since it would point to "
> + "itself (i.e. to %s)",
> + b->name, oid_to_hex(&b->oid));
> + refs_delete_ref(get_main_ref_store(the_repository),
> + NULL, b->name, NULL, 0);
> + return 0;
I am not sure if a warning is even warranted. If you decide to
replace an object A with the same object A, the result ought to be a
no-op. I wonder if it is makes more sense to
(1) do this unconditionally and silently, and
(2) when we prepare the replace_map, ignore self-referencing ones.
instead. If (2) makes sense entirely depends on the answer of an
earlier question (i.e. "is there a reason why self-reference is more
common than general loop?").
Thanks. |
On the Git mailing list, Junio C Hamano wrote (reply to this): Junio C Hamano <gitster@pobox.com> writes:
> I am not sure if a warning is even warranted. If you decide to
> replace an object A with the same object A, the result ought to be a
> no-op. I wonder if it is makes more sense to
>
> (1) do this unconditionally and silently, and
> (2) when we prepare the replace_map, ignore self-referencing ones.
>
> instead. If (2) makes sense entirely depends on the answer of an
> earlier question (i.e. "is there a reason why self-reference is more
> common than general loop?").
Forgot to add. (1) could be done even at a lower layer, i.e. the
ref API could be told to reject such a replace ref creation/update
that maps an object name to itself. |
On the Git mailing list, Elijah Newren wrote (reply to this): On Thu, Nov 14, 2024 at 4:31 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
>
> > Most git commands that you try to run in such a repository with a
> > self-pointing replace object will result in an error:
> >
> > $ git log
> > fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264
>
> After reading up to this point, with "Most git commands", I was
> afraid that you were going to say "... but I found this command that
> fails to stop gracefully, and instead infinitely spin".
>
> If that is not a case, then I am happy ;-) but in that case probably
> "Most" -> "All" is warranted.
Some git commands are not bothered by the existence of the problematic
replace ref; e.g.:
$ git show-ref
64026eaa60e0208a00946cdd3cb41b059c6675bd refs/heads/main
fb92ebc654641b310e7d0360d0a5a49316fd7264
refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264
fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/tags/msg
14f901a95ebae912feb4805f40ef68f15b0192c2 refs/tags/test
So, it's not "all" git commands. Maybe I should rephrase as:
"""
Some git commands will ignore such replace refs, but many will fail
with an error:
$ git log
fatal: replace depth too high for object
fb92ebc654641b310e7d0360d0a5a49316fd7264
"""
?
> > Avoid such problems by deleting replace refs that will simply end up
> > pointing to themselves at the end of our writing. Warn the users when
> > we do so, unless they specify --quiet.
>
> This may need a bit of rephrasing.
>
> Even when they specify "--quiet", you'd refrain from creating a
> self-referencing replace entry, which makes sense, but it was not
> clear with only the above description. I had to read the patch text
> to find it out.
Would reordering the two phrases of the last sentence do it?
"""
Avoid such problems by deleting replace refs that will simply end up
pointing to themselves at the end of our writing. Unless users specify --quiet,
warn the users when we do so.
"""
> Is it reasonable to expect that a self referencing replace entry is
> the most common thing to happen, or loops with more than one
> elements are equally plausible to happen but the self-referencing
> one is singled out by this patch because it is trivial to notice,
> unlike other forms of equally problematic loops?
Loops with more than one element are possible, but I couldn't see an
angle by which they are plausible. In particular, I know how a user
can (and did) accidentally trigger a self-referencing replace ref (if
you're curious, see
https://github.com/newren/git-filter-repo/issues/401; it's much less
likely to trigger now due to an unrelated change to the default for
`--replace-refs`, but could still be triggered in combination with
that option). There are many variations on their theme of "do a
change, then undo it" (in combination with the old `--replace-refs`
default), but I don't see how to either vary their testcase or come up
with some other way to get git-filter-repo to trigger a replace ref
loop with more than one element.
> > diff --git a/builtin/fast-import.c b/builtin/fast-import.c
> > index 76d5c20f141..51c8228cb7b 100644
> > --- a/builtin/fast-import.c
> > +++ b/builtin/fast-import.c
> > @@ -179,6 +179,7 @@ static unsigned long branch_load_count;
> > static int failure;
> > static FILE *pack_edges;
> > static unsigned int show_stats = 1;
> > +static unsigned int quiet;
> > static int global_argc;
> > static const char **global_argv;
> > static const char *global_prefix;
> > @@ -1602,7 +1603,19 @@ static int update_branch(struct branch *b)
> > struct ref_transaction *transaction;
> > struct object_id old_oid;
> > struct strbuf err = STRBUF_INIT;
> > -
> > + static const char *replace_prefix = "refs/replace/";
> > +
> > + if (starts_with(b->name, replace_prefix) &&
>
> Curious how the "diff" machinery decided to represent the hunk like
> this, instead of say
>
> > struct strbuf err = STRBUF_INIT;
> > + static const char *replace_prefix = "refs/replace/";
> >
> > + if (starts_with(b->name, replace_prefix) &&
>
> Of course that has nothing to do with this patch or its review.
Oh, that is kinda weird. It's not a whitespace-change issue either;
both the removed and added blank-looking lines really are blank lines.
No idea why the diff machinery does that.
> > + !strcmp(b->name + strlen(replace_prefix),
> > + oid_to_hex(&b->oid))) {
> > + if (!quiet)
> > + warning("Dropping %s since it would point to "
> > + "itself (i.e. to %s)",
> > + b->name, oid_to_hex(&b->oid));
> > + refs_delete_ref(get_main_ref_store(the_repository),
> > + NULL, b->name, NULL, 0);
> > + return 0;
>
> I am not sure if a warning is even warranted. If you decide to
> replace an object A with the same object A, the result ought to be a
> no-op. I wonder if it is makes more sense to
>
> (1) do this unconditionally and silently, and
> (2) when we prepare the replace_map, ignore self-referencing ones.
>
> instead. If (2) makes sense entirely depends on the answer of an
> earlier question (i.e. "is there a reason why self-reference is more
> common than general loop?").
>
> Thanks.
Since you added more in a subsequent response, I'll go respond to that. |
User |
On the Git mailing list, Elijah Newren wrote (reply to this): On Thu, Nov 14, 2024 at 5:08 PM Junio C Hamano <gitster@pobox.com> wrote:
>
> Junio C Hamano <gitster@pobox.com> writes:
>
> > I am not sure if a warning is even warranted. If you decide to
> > replace an object A with the same object A, the result ought to be a
> > no-op.
Do you mean that the result ought to be that there is no replacement?
If so, I agree, but that's not always equivalent to a no-op...
> > I wonder if it is makes more sense to
> >
> > (1) do this unconditionally and silently, and
> > (2) when we prepare the replace_map, ignore self-referencing ones.
> >
> > instead. If (2) makes sense entirely depends on the answer of an
> > earlier question (i.e. "is there a reason why self-reference is more
> > common than general loop?").
>
> Forgot to add. (1) could be done even at a lower layer, i.e. the
> ref API could be told to reject such a replace ref creation/update
> that maps an object name to itself.
Perhaps rejecting the creation or update of a replace ref would make
sense at a lower level, but a no-op isn't what I want here -- it'd
just result in a different bug. In particular, note that my patch
modifies fast-import to issue a delete where it would otherwise issue
an update instead of merely rejecting the update. If the repository
had a value for the replace ref before fast-import was run, and the
replace ref was explicitly named in the fast-import stream, I don't
want the replace ref to be left with a pre-fast-import value. |
On the Git mailing list, Junio C Hamano wrote (reply to this): Elijah Newren <newren@gmail.com> writes:
> ... If the repository
> had a value for the replace ref before fast-import was run, and the
> replace ref was explicitly named in the fast-import stream, I don't
> want the replace ref to be left with a pre-fast-import value.
Ah, right. We want to honor the user's latest wish, i.e., if they
create a replace ref that maps A to A, when they ask for object A,
instead of returning it, we need to return what the replace ref
refers to, which happens to be object A. So you are right. "You are
trying to map A to A, so we'll ignore that request" is a wrong thing
to do. "In order to give you A when you ask for A, we will remove
the existing mapping for A" is absolutely the right thing to do,
which is what your patch does.
Thanks. |
If someone replaces a commit with a modified version, then builds on that commit, and then later decides to rewrite history in a format like git fast-export --all | CMD_TO_TWEAK_THE_STREAM | git fast-import and CMD_TO_TWEAK_THE_STREAM undoes the modifications that the replacement did, then at the end you'd get a replace ref that points to itself. For example: $ git show-ref | grep replace fb92ebc654641b310e7d0360d0a5a49316fd7264 refs/replace/fb92ebc654641b310e7d0360d0a5a49316fd7264 Git commands which pay attention to replace refs will die with an error when a self-referencing replace ref is present: $ git log fatal: replace depth too high for object fb92ebc654641b310e7d0360d0a5a49316fd7264 Avoid such problems by deleting replace refs that will simply end up pointing to themselves at the end of our writing. Unless users specify --quiet, warn them when we delete such a replace ref. Two notes about this patch: * We are not ignoring the problematic update of the replace ref (turning it into a no-op), we are replacing the update with a delete. The logic here is that if the repository had a value for the replace ref before fast-import was run, and the replace ref was explicitly named in the fast-import stream, we don't want the replace ref to be left with a pre-fast-import value. * While loops with more than one element (e.g. refs/replace/A points to B, and refs/replace/B points to A) are possible, they seem much less plausible. It is pretty easy to create a sequence of git-filter-repo commands that will trigger a self-referencing replace ref, but I do not know how to trigger a scenario with a cycle length greater than 1. Signed-off-by: Elijah Newren <newren@gmail.com>
7402518
to
cc77a00
Compare
/submit |
Submitted as pull.1824.v2.git.1731968389590.gitgitgadget@gmail.com To fetch this version into
To fetch this version to local tag
|
On the Git mailing list, Junio C Hamano wrote (reply to this): "Elijah Newren via GitGitGadget" <gitgitgadget@gmail.com> writes:
> + grep "Dropping.*since it would point to itself" msgs &&
> + git show-ref >refs &&
> + ! grep refs/replace refs
"test_grep !" may make a failure case easier to diagnose.
Other than that, looking good.
Thanks, will queue. |
This patch series was integrated into seen via git@6fbd691. |
This patch series was integrated into seen via git@c5784eb. |
This patch series was integrated into next via git@751ee6b. |
This patch series was integrated into seen via git@4cace49. |
This patch series was integrated into seen via git@81a7312. |
This patch series was integrated into seen via git@0198811. |
This patch series was integrated into seen via git@fb6fde1. |
This patch series was integrated into seen via git@8eaa065. |
This patch series was integrated into master via git@8eaa065. |
This patch series was integrated into next via git@8eaa065. |
Closed via 8eaa065. |
Changes since v1: Clarified wording in the commit message, and added a little more text at the end of the commit message to address questions that came up in review.
cc: Elijah Newren newren@gmail.com