-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mandatory commit maps, enabling squash-merge (meld with "subcommit" ideas) #8
Comments
I was editing that over the course of many days, it's time to push it even though I have further thoughts. |
My first instinct is for a Just checked and the
|
Further notes:
|
Wow, man, I did a first read for this evening and I have to say, that including info in custom header seems to me like a really clever thought! Also there did arise a question - does it work also with the option to not do squashing? So this is for the first thoughts - I will read it again and have comments :) |
Yes! It might even be easier if I preclude squashing (although maybe not, since the problem of determining which Sub commits to keep refs to seems more or less the same as the problem of determining which commits to keep in the commit map) |
Wow, just noticed #7, I hadn't thought about signatures at all but that suggests another transformation of commits when assimilating that it is important to undo when splitting out assimilated commits: stripping signatures. (That is, assimilating commits strips signatures; when splitting them out later, it's important to map back to the signed Sub commits.) So that's a point in favor of adding the custom header or special commit message line to every assimilated commit, or at least certainly every signed one. Note that if we keep around the commit map and signatures, we still don't need to keep around refs to the actual commits, that's enough information to recreate the Sub commit objects. |
Cool, thanks for answer! So, for 'custom header' option - is there a way then to see if: My PR does that by simply comparing references and their histories, but for this clever strategy, how would the mechanism work? |
I was thinking about the idea of caching a separate map between Main commits and Sub commits (#3), and I was thinking about how
git push
andgit fetch
can be used to share any ref underrefs/
created bygit update-ref
(which is how sharing Git Notes works, for example) and how we might be able to use that to share the map between Main commits and Sub commits, and I might have stumbled on an idea to merge mysubcommit
ideas into this.I've confirmed that refs can be trees or blobs, and that
git push --force-with-lease=<refname>:<sha>
works with them (other forms of--force-with-lease
don't, though), and--force-with-lease
has been in Git since 1.8.5, so I'm pretty sure this can work.So my line of thinking went something like this. We wouldn't want
git subhistory split/merge/push/pull/what-have-you
to be crazy slow the first time after cloning a big repo, we would encourage/require people to push and fetch these maps. But if everyone's using these commit-to-commit maps, then there's no reason the underlying contents of the commits have to correspond as perfectly assubhistory
is currently designed around.Marking commits
In particular, squash-merging could totally work! For illustrative purposes, suppose that the 3rd commit on
master
to modify Sub comes before merging:Say we're squash-merging
sub-upstream/master
intomaster
. As with normalgit-subhistory merge
, we split the history of Sub inHEAD
out asSPLIT_HEAD
, but then instead of assimilating theSPLIT_HEAD..sub-upstream/master
commits, we first mergesub-upstream/master
directly intoSPLIT_HEAD
:And we use that merged Sub tree in a new squash-merge commit on
master
:Note that as far as the rest of Git is concerned, the squash-merge commit is a normal, non-merge commit (with only one parent) that happens to make changes only in
path/to/sub/
. But tosubhistory
, it's a commit assimilated from a Sub commit, with an entry in the commit map from the squash-merge commit to the Sub merge commit.This is important because the squash-merge commit needs to be split out as that Sub merge commit. Suppose, one last time, another (4th) commit on
master
modifies Sub:And then we split that out to push upstream:
It's important that this split-out commit be a fast-forward from the
Merge branch 'sub-upstream/master' into path/to/sub/ subhistory of master
commit, so that it will be a fast-forward fromsub-upstream/master
; if upstream has further updates,Fix Sub some more
will be the merge base. If instead this split-out commit weren't this non-squash merge commit, if instead theFix Sub some{how, more}
commits were squash-merged into Sub's history, thenFix Sub some more
won't be the merge base and could well conflict.One complication is that Git is a distributed systems problem: what if someone else pulls down the squash-merge commit, makes more changes to Sub on top, and then split it out? As noted above, it's critical that the squash-merge commit be split out as the underlying non-squash Sub merge commit so that the merge base with upstream will be the right one. How do we enforce that the commit map is up-to-date at split time? Ideas:
origin
? What if they didn't use a remote, just passed a Git URL directly togit pull
or something? Too many ways to pull in commits without using refspec in config for this to work.path/to/sub/.gitsubhistory/assimilated-from
or something, to tell us to download and use the subproject commit. Problem: subsequent normal commits will have the same file with the same contents unless the user manually changes this file.Merge branch blah blah...
commit message format is really verbose and it's likely there are people who prefer to customize those to be more readable; GitHub overrides that, for example. It would be less bad if the requirement is merely "last line must be of the formAssimilated from da39a3ee5e6b4b0d3255bfef95601890afd80709.
" or something, but still, it feels like if the user manually edits the commit message and like, misspells "Assimilate" or something, that shouldn't breaksubhistory
, that would be stupid.tree 8d640c644213d7e508971236aaeda72ea1b1a509 parent f45ffa8f782e7263702846facac99498788e6ce8 author Han Seoul-Oh <laughinghan@gmail.com> 1483412929 -0500 committer Han Seoul-Oh <laughinghan@gmail.com> 1483412929 -0500 +subhistory Sub da39a3ee5e6b4b0d3255bfef95601890afd80709 Subject line Commit message body
git cherry-pick
always, and bygit rebase -i
if any earlier commit is changed (probably because it runsgit commit-tree
which generates the commit from scratch).git commit --amend
does preserve the header though, as doesgit rebase -i
if nothing earlier changes (i.e., if the parent is the same hash). Also, this would obviously be more annoying to generate and parse than the commit message.So, the squash-merge commit object itself must somehow be marked with the Sub merge commit to tell us to download and use it, the commit map alone is insufficient.
This should work for empty commits (#6), too. Open question: should we do this for every assimilated commit, then, not just squash-merge commits and assimilated empty commits? (If we do it for all assimilated commits, we wouldn't even need that direction of cache map, right? And takes care of transformed commit messages.)
Invariants
Another natural question is whether we should symmetrically be marking split-out commits too, but I think the answer to that is a definitive no. They're fundamentally asymmetrical: a given commit of Main has some fixed number of subprojects in it, whereas a given commit of Sub could be assimilated into any number of superprojects in the future. It would be weird for a superproject assimilating a Sub commit to have information on the hash of a commit in some other unrelated superproject (in the marking of the split-out commit).
And how would it be useful? Having the split-out commit have a transformed commit message (subcomponent prefix removed, for example)? So, what, next time we split out the Main commit C, we check to see if there's already a split-out Sub commit C' with a marking pointing back to C? Remember, distributed systems problem: what if someone else downloads C but hasn't downloaded C', when they split out C will they get a different hash from C'?
This is a fundamental thing that
subhistory
needs to satisfy, which leads to a fundamental invariant:Note that the current guarantee is stronger than this, where there's a unique Sub commit that we're able to actually create from the commit object alone. This proposal weakens that guarantee: we may have to download a ref to the Sub commit, because a squash-merge commit just doesn't have enough information. But we know from the commit object alone (due to the marking) that we need to download that ref.
Problems:
pre-push
hook that doesdisown
(to daemonize) and then pushes refs to subcommits?The text was updated successfully, but these errors were encountered: