automatically preserve mtime during merge #36

DominikFickenwirth · 2022-03-15T18:19:40Z

My current method

During merges, I want to preserve the mtime for all files that only changed in one branch. In this issue I will focus on the mtime field, but this issue might affect other fields as well.

Right now, I do git checkout --ours -- .git_store_meta during the merge. After manually completing the merge, git-store-meta.pl --update will do the rest. This preserves the datetime of files which only changed in "our" branch, but not those of "their" branch. I elaborate below.

How do you guys handle merges of file dates at the moment? Just do it manually?

Merging

Let's say, I'm merging their-branch into our-branch. When the merge succeeds automatically, everything works as expected. In this case, .git_store_meta got successfully merged like any other file, which is what we want. The pre-commit hook does not get called in this case, so the merged version of .git_store_meta will be committed.

Problems only arise, when the merge stops before creating the merge commit. After completing the merge manually with git merge --continue or git commit, the pre-commit hook is called. Generally, this is a good thing: Files which were changed in both branches should get the current datetime, since the merge actually changed the contents of the file.

There are a few cases however, where we want to preserve the datetime of their-branch:

When a file was created / changed in their-branch only, but not in our-branch
When we resolve a conflicting file with git checkout --theirs -- filename.

Note, that the datetime of our-branch already get preserved:

When a file was created / changed in our-branch only, but not in their-branch.
When we resolve a conflicting file by git checkout --ours -- filename. For this to work, use "my version" above.

Rebasing

Let's say, I'm rebasing branch-A on branch-B. If I understand rebasing correctly, this is kind of equivalent to the following:

create a new branch branch-A2 based on branch-B
merge every commit of branch-A into branch-A2 one by one.
So, having a solution to the merging problem should solve rebasing as well, even though we'd have to call git-store-meta.pl ourselves (as rebasing does not call our hooks).
reset branch-A to branch-A2 and delete the latter

Thus, rebasing internally uses the same mechanism as merging. Therefore, solving this issue for merges should solve it for rebases as well.

Solution suggestion

I see two options, how we could make this work:

A separate command that we need to run manually during a merge or rebase
Adding some kind of check inside the --store and --update procedures, whether we are in the process of merging, rebasing, or just committing.

The text was updated successfully, but these errors were encountered:

johnnyutahh · 2022-03-15T18:23:50Z

I don't use git-store-meta yet, but I plan to.

Adding some kind of check inside the --store and --update procedures, whether we are in the process of merging, rebasing, or just committing.

Is there any reason not to employ the above solution? eg: is it harder to implement, harder for the user to execute and/or understand, or does said (above) solution present some other downside or undesirable outcome?

DominikFickenwirth · 2022-03-15T19:08:57Z

Originally, I thought that having a separate command would make it easier to use on rebases, but on second thought, I don't think that's the case.
Also, there might be special cases where we don't want this mechanism to kick in (but I can't name any right now...)

johnnyutahh · 2022-03-15T19:11:58Z

Originally, I thought that having a separate command would make it easier to use on rebases, but on second thought, I don't think that's the case.

Making the user remember an additional step (manual command) in a merge-like (rebase or otherwise) procedure only for git-store-meta and requiring them to remember to do this: that does indeed seem to be asking a lot of the user.

Also, there might be special cases where we don't want this mechanism to kick in (but I can't name any right now...)

For any special cases that arise in the the future: consider making an additional --switch parameter to handle said special cases? This approach seems reasonable to me.

DominikFickenwirth · 2022-03-15T21:10:07Z

I agree with both of your statements. 2. Seems to be the reasonable approach here.

Making the user remember an additional step (manual command) in a merge-like (rebase or otherwise) procedure only for git-store-meta and requiring them to remember to do this: that does indeed seem to be asking a lot of the user.

As I understand it, git-store-meta cannot be automated during a rebase, as a rebase commit cannot be hooked. So the user needs to remember an additional step anyways.
Do you see another way of completely automating this during a rebase?

danny0838 · 2022-03-16T12:05:01Z

As all the issue is about when the hook should or should not run, it should make more sense to do the checking in the hooks directly.

The proposed change is available in the devel branch. In the first commit we refactored the hooks to make them easier to be manually edited or merged with other possible hook systems. The second commit adds a check to skip the hook if a merge (normal or squashing) or revert is in progress.

Rebasing (and similarly, cherry-picking and aming) is a different story. It's about reapplying patch(es) and does not involve committing unless the user explicitly runs git commit rather than git rebase --continue when resolving a conflict, or for an advanced operation like splitting or inserting commits, in which case running the hook are probably desired. As a result, we don't tend to skip for rebasing.

Also note that the user can always skip the pre-commit hook by adding --no-verify. So we basically have two options:

Do nothing and let the pre-commit hook run as now. If the user wants to skip GSM, add --no-verify.
Skip the pre-commit hook during a merge. If the user wants to update GSM, run manually.

Although it seems mostly reasonable to skip the pre-commit hook for a merge (or revert), it still adds a memory burden to the user for when the hook will run, and we may need more evaluation about whether to implement it.

Technically it's also possible to implement a more dedicated mechanism for a manual skip, such as checking for a specific environmental variable like GIT_STORE_META_SKIP_UPDPATE or a Git config like gsm.skipupdate or even branch.<branch_name>.gsmskipupdate. As we currently can't think of sufficient practical use cases, we probably won't dig into it so deeply. The user can simply customize the hooks on their own if they really need such feature anyway.

DominikFickenwirth · 2022-03-17T05:36:28Z

First of all, thank you for your fast response! Your changes to the hooks in the devel branch work as expected in my local environment. It may be useful to echo the reason why the hook won't be executed, e.g. "Not applying metadata after squash merge".

Rebasing (and similarly, cherry-picking and aming) is a different story. It's about reapplying patch(es) and does not involve committing unless the user explicitly runs git commit rather than git rebase --continue when resolving a conflict, or for an advanced operation like splitting or inserting commits, in which case running the hook are probably desired. As a result, we don't tend to skip for rebasing.

I agree. I didn't think of running git commit during a rebase. I think this is enough for the rebase case.

However, the issue is not really about when the hook should run or not. My original post wasn't clear at all - I apologize for that. I actually want the pre-commit hook to run during a merge. My goal is to make merging metadata as automated as possible, such that the user doesn't have to worry about metadata during every single merge.

When committing a merge, I want git-store-meta.pl --update to result in the following behavior:

Metadata for files that changed in our-branch and their-branch should be extracted from the worktree
Metadata for files that changed only in our-branch, but not in their-branch, should be extracted from our-branch
Metadata for files that changed only in their-branch, but not in our-branch, should be extracted from their-branch
The metadata extracted in 2. and 3. should be applied to the files in the worktree
The metadata extracted in 1., 2. and 3. should be stored in .git_store_meta.

The user could then do something like:

git switch our-branch
git merge their-branch
[resolve conflicts]
git checkout --ours -- .git_store_meta  (my hack for resolving the file)
git merge --continue  (or git commit)

and the pre-commit hook would do the rest. There is probably a better way to resolve conflicts in .git_store_meta itself. I'm still fairly new to git and lack an overview of what's possible and what's not.

Also note that the user can always skip the pre-commit hook by adding --no-verify. So we basically have two options:

Do nothing and let the pre-commit hook run as now. If the user wants to skip GSM, add --no-verify.

Skip the pre-commit hook during a merge. If the user wants to update GSM, run manually.

Since I do want the hook to run during merges, I would prefer option 1.

Technically it's also possible to implement a more dedicated mechanism for a manual skip, such as checking for a specific environmental variable like GIT_STORE_META_SKIP_UPDPATE or a Git config like gsm.skipupdate or even branch.<branch_name>.gsmskipupdate. As we currently can't think of sufficient practical use cases, we probably won't dig into it so deeply. The user can simply customize the hooks on their own if they really need such feature anyway.

I agree. That seems like overkill.

DominikFickenwirth · 2022-03-17T06:09:43Z

I've just stumbled across merge drivers. Those seem to be an option for resolving the .git_store_meta file automatically. We could implement a custom merge driver through git-store-meta.pl --merge baseGSM ourGSM theirGSM prmL prmP using a behavior like 1. to 5. above.

For example, if for some file the mtime (or atime) was changed both in our-branch and their-branch, we use the current system datetime. If any other fields differ, mark it as a conflict.
This would fully automate merges, where the only conflicts are in .git_store_meta itself.
This would also work for rebases: Files without changes would get the old mtime, whereas files that changed during the rebase would get the current system datetime.

By doing this, maybe we don't even need the changes I described for --update - Although conflicting files resolved by "use ours/theirs" would get a new mtime then, and the user would have to --apply the new values manually again.

danny0838 · 2022-03-17T12:27:43Z

It may be useful to echo the reason why the hook won't be executed, e.g. "Not applying metadata after squash merge".

Good idea. Added.

I actually want the pre-commit hook to run during a merge. My goal is to make merging metadata as automated as possible, such that the user doesn't have to worry about metadata during every single merge.

I get it. However this shouldn't be the work of the pre-commit hook, which is run during the user initiated commit AFTER the merge conflicts have been manually resolved and can't get the essential clues about what the conflicts are.

As you have noticed, this is more likely the job of a merge driver. However there would be many potential algorithms the user might want, such as "take the later mtime/atime if both modified". Also I'm not the expert about diff and merge algorithm, which is a nontrivial computer science problem. We probably won't—maybe actually can't—do it until some skilled body contributes on it or we've got time to research on it.

DominikFickenwirth · 2022-03-22T23:43:43Z

However this shouldn't be the work of the pre-commit hook, which is run during the user initiated commit AFTER the merge conflicts have been manually resolved and can't get the essential clues about what the conflicts are.

I agree. The only thing we would need the pre-commit hook for, is if we want to store new metadata for the files manually changed by the user during the merge. Also, it's the only hook where we could apply metadata to the worktree (even though I would prefer applying metadata after the commit). But this seems like a second step. The merge driver is more important I think.
Also note that we can still get information about conflicts through MERGE_HEAD.

However there would be many potential algorithms the user might want, such as "take the later mtime/atime if both modified".

That seems reasonable as well. At least for file dates where the user resolved a conflict manually, I would prefer the date of the merge. But that's just personal taste, I guess. Maybe make both possible via an option?

Also I'm not the expert about diff and merge algorithm, which is a nontrivial computer science problem.

I would focus on the merge driver for the .git_store_meta file right now. I think diffing our GSM file works well enough with the standard diff algorithm.

Unfortunately, I have never written anything in perl, but our merge algorithm for .git_store_meta shouldn't be too difficult, since lines are independent and sorted. I'm working on pseudo code right now, which I will post as soon as I'm done.

DominikFickenwirth · 2022-03-23T01:12:40Z

Assuming that lines inside .git_store_meta are sorted alphabetically, we could do

git-store-meta.pl --resolve <basefile> <ourfile> <theirfile> <markerlength> <destfilename>

baselines  = readLines(basefile);  //read files as array of strings
ourlines   = readLines(ourfile);
theirlines = readLines(theirfile);
result = empty array or list of strings;
NOW = getCurrentDateTime();

i = 0; //index for baseline
j = 0; //index for ourlines
k = 0; //index for theirlines

conflicts = false;
while (j < length(ourlines) or (k < length(theirlines)) do {

  // 1. Get smallest unhandled filename
  //====================================
  basename  = getFilename(baselines, i); //should be empty string if i = array length
  ourname   = getFilename(ourlines,  j); //should be empty string if j = array length
  theirname = getFilename(theirlines,k); //should be empty string if k = array length
  filename = alphabeticallySmallest(basename, ourname, theirname); //ignores empty strings


  // 2. Extract stored fields from base, ours and theirs
  //=====================================================
  basefields  = empty;
  ourfields   = empty;
  theirfields = empty;
  if basename  = filename then { basefields  = getFields(baselines[i]);  i = i+1; };
  if ourname   = filename then { ourfields   = getFields(ourlines[j]);   j = j+1; };
  if theirname = filename then { theirfields = getFields(theirlines[k]); k = k+1; };


  // 3. Handle simple cases
  //========================
  // 3.a. equal in our branch and their branch?
  if ourfields = theirfields then {
    result.add(ourfields); //"add" should ignore the value "empty"
  }
  // 3.b. no change in theirs => take ours
  else if theirfields = basefields then {
    result.add(ourfields);
  }
  // 3.c. no change in ours => take theirs
  else if ourfields = basefields then {
    result.add(theirfields);
  }

  // --> If not handled yet, then BOTH branches changed the metadata of filename.
  
  // 4. SOLVE conflicts only affecting atime and/or mtime
  //======================================================
  else if withoutAtimeMtime(ourfields) = withoutAtimeMtime(theirfields) then {
    resultfields = ourfields;
    if ourfields.atime != theirfields.atime then { resultfields.atime = NOW; }; //or maximum
    if ourfields.mtime != theirfields.mtime then { resultfields.mtime = NOW; }; //or maximum
    result.add(resultfields);
  }

  // 5. MARK conflicts affecting other fields
  //==========================================
  else {
    conflicts = true
    result.add("<<<<<< HEAD"); //use <markerlength> for number of "<" characters
    result.add(ourfields);
    result.add("======");      //use <markerlength> again
    result.add(theirfields);
    result.add(">>>>>> "+ getShortSha1(MERGE_HEAD)); //use <markerlength> again
  }
}
writeLines(result, ourfile); //write result back to <ourfile>
if conflicts then exit(1); else exit(0);

I hope this code is understandable. If not, feel free to ask!

Notes:

See gitattributes for more information on defining custom merge drivers.
See also merge-config.
Optional: During 5., check for "git config merge.conflictstyle = diff3" and add |||||| markers.

The merge driver needs to be configured in $GIT_DIR/config or $HOME/.gitconfig:

[merge "git_store_meta_merger"]
    name = merge driver for metadata stored with git-store-meta.pl
    driver = $GIT_DIR/hooks/git-store-meta.pl --resolve %O %A %B %L %P
    recursive = ?

This merge driver needs to be called in .gitattributes:
```
/.git_store_meta merge=git_store_meta_merger
```
Please note that THIS IS ALL STILL UNTESTED.

danny0838 · 2022-04-02T14:30:33Z

Finally we decided to skip commits during all special status as it's more consistent and error-proof (the user can easily perform an update if not run; but cannot easily revert an update if it has run unexpectedly).

DominikFickenwirth changed the title ~~automatically preserve datetime during merge~~ automatically preserve mtime during merge Mar 15, 2022

danny0838 added enhancement help wanted labels Mar 17, 2022

danny0838 added a commit that referenced this issue Apr 2, 2022

Skip pre-commit hook if a merge etc. is in progress (#36)

46c8ce0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

automatically preserve mtime during merge #36

automatically preserve mtime during merge #36

DominikFickenwirth commented Mar 15, 2022

johnnyutahh commented Mar 15, 2022

DominikFickenwirth commented Mar 15, 2022

johnnyutahh commented Mar 15, 2022 •

edited

Loading

DominikFickenwirth commented Mar 15, 2022

danny0838 commented Mar 16, 2022

DominikFickenwirth commented Mar 17, 2022 •

edited

Loading

DominikFickenwirth commented Mar 17, 2022 •

edited

Loading

danny0838 commented Mar 17, 2022

DominikFickenwirth commented Mar 22, 2022

DominikFickenwirth commented Mar 23, 2022

danny0838 commented Apr 2, 2022

automatically preserve mtime during merge #36

automatically preserve mtime during merge #36

Comments

DominikFickenwirth commented Mar 15, 2022

My current method

Merging

Rebasing

Solution suggestion

johnnyutahh commented Mar 15, 2022

DominikFickenwirth commented Mar 15, 2022

johnnyutahh commented Mar 15, 2022 • edited Loading

DominikFickenwirth commented Mar 15, 2022

danny0838 commented Mar 16, 2022

DominikFickenwirth commented Mar 17, 2022 • edited Loading

DominikFickenwirth commented Mar 17, 2022 • edited Loading

danny0838 commented Mar 17, 2022

DominikFickenwirth commented Mar 22, 2022

DominikFickenwirth commented Mar 23, 2022

Notes:

danny0838 commented Apr 2, 2022

johnnyutahh commented Mar 15, 2022 •

edited

Loading

DominikFickenwirth commented Mar 17, 2022 •

edited

Loading

DominikFickenwirth commented Mar 17, 2022 •

edited

Loading