Replies: 2 comments 2 replies
-
Also more of an idle curiosity thing: why do |
Beta Was this translation helpful? Give feedback.
-
Indeed, tree-diffing immediately comes to mind when proper sorting according to
That would be interesting to learn how that's possible. Any API allowing the creation of trees should assure these are sorted properly, at least that's my assumption. If trees are created from an index file, these are properly sorted as well. All that is most certainly done to avoid having to check for sorting before using trees, for diffing usually, which is vital to performance.
That's how
Whatever the implementation, it can't be slower than what's there, and a BTreeSet sounds like a lot of additional allocations. You will see that in the interest of speed, diff APIs only iterate entries without actually building a Vec, and the same holds true for looking up the hash at a certain path for simple 'is changed' tests, as that is actually faster than paying for a vec without reuse, despite the lookup then being powered by the faster binary search (I tried that). Please note that I am probably glossing over or not picking up a few points or details at all, if so, please feel free to highlight them and I will try again. |
Beta Was this translation helpful? Give feedback.
-
I stumbled upon a possible issue with trees in git-objects (and possibly downstream subcrates), but I'm not quite sure what consideration it should get: currently,
Tree::entries
has a documentary note indicating that it must be sorted, andWriteTo for Tree
has debug assertions ensuring this is to be so.I wondered what'd happen if I replaced the
Vec
by aBTreeSet
(which are naturally sorted so the constraint would not be an issue anymore), but aside fromnom::multi::many0
returningVec
(so this would require a heavy conversion during parsing) it led me to find a few potential issues / incompatibilities with Git:git fsck
but but at bestgit-mktree(1)
will protect ordering issues (it automatically reorders its input, but does not deduplicate)/
, this means regular fsck-valid git objects will break gitoxide's assumptions, per git's own comments:(1) seems like it could be a minor issue at any time, however in trying to understand these cases it seemed like not all hosting sites will accept them (however it also seems like people routinely unwittingly create such objects).
However (2) seems like it could be a pretty major issue depending on the reason for the note, and whether it hinges on the ordering of the entries in the object, or only on the entries being sorted according to
Entry::cmp
(in the latter case fixingOrd
should do the trick). Is the issue here the tree diffing? Or is it something else? Or is it both tree diffing and other concerns?A secondary concern is API-wise: given the original note and the failure of my
BTreeSet
scheme I figured providing a newtype for tree entries which would always uphold the invariants, but of course this is an issue if they're not actually invariant. Though I guess "undefined behaviour" is fine in that case. Related:binary_search
definitely misbehaves when given mis-ordered sequences butselect_nth_unstable
is only in 1.49, what rustc version is gitoxide bound to? Thoughbinary_search
would have issues deduplicating blob/tree entries, andselect_nth_unstable
would be of no help at all there (aside from looking around the newly moved entry if one of its siblings has a colliding name I guess, but again that'd be defeated by git's somewhat strange ordering predicate).Beta Was this translation helpful? Give feedback.
All reactions