🎉Introducing gix archive
🎉
#969
Byron
started this conversation in
Show and tell
Replies: 1 comment 7 replies
-
Which crate is that? |
Beta Was this translation helpful? Give feedback.
7 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
gix archive
is a new sub-command which pretty much does what you think it will: given atreeish
and a file path, it will extract the treeish exactly like it would when checking it out to stream it into the file path in one of multiple formats,tar
,tar.gz
andzip
.gix archive
some advantages overgit archive
:git lfs
(zip
only)tar
)Performance
Let's dig into some performance comparisons on the linux kernel cloned from git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git.
The reason for generally better performance is certainly that there is very basic concurrency that offloads the streaming of worktree data into its own thread. That way, the container format can be handled by the main thread all by itself. Further,
gix
is has a fasterodb
implementation and useszlib-ng
which overall yields about 10% better performance when reading objects from the object store.When compressing,
gix
also benefits from theflate2
crate and its advanced backends.It's interesting to see that thanks to
gix
it's now possible to get the highest compression level (tar.gz max
) in the same time thatgit
takes for the default level. In this particular case, one only saves ~3MB with that though. Also, one can get a lightly compressed archive (tar.gz min
) faster than an uncompressedtar
archive fromgit
, albeit at low compression with a 345MB size.File Sizes
All the performance in the world wouldn't be useful if the produced files would be considerably bigger. Let's take a look.
When looking at
tar.gz
, it seems to be quite exactly the same as whatgit
produces and is quite uninteresting in that.zip
, however, takes the lead in being slightly smaller and more than twice as fast to produce. Unfortunately it still has a shortcoming of not being able to reproduce symbolic links. If that would be fixed, it would be the most advanced format as it's also able to stream large files.Memory Consumption
Finally, let's be sure that
gix
doesn't need unreasonable amounts of memory producing these files.When looking at
max-resident
(max-res
) size,gix
uses consistently less, saving nearly 20% at all times. However, when looking at thepeak
memory (which probably doesn't include virtual memory),gix
uses nearly 50% more. This clearly has to do with the container formats which seem to keep quite a lot of extra data around when setting them up, which might be an issue in repositories with a lot of files under the assumption that this scales with file-count.It's worth noting that despite some shortcomings, it seems that
tar
is the best format when memory consumption is a concern - thengix
will always outperformgit
both in memory consumption and performance (*in this particular setup).Shortcomings
However, it's not all roses right now, and probably due to me creating the
zip
archive incorrectly, symlinks for some reason don't manifest during extraction despite being contained in the archive.This works when using
tar
though.Further, I have the feeling that compression settings aren't applied for some reason for
tar.gz
, and it's unclear how to set the compression forzip
archives.Also, submodules aren't yet added to the archive, which is the same shortcoming as for
git
itself, but that bound to happen as submodule support is currently being added togitoxide
.Conclusion
Implementing a minimal viable product of
gix archive
merely as an experiment took only 3 days and showed how powerfulgix
has become, making it possible to write tools that don't only rival the standard implementation, but can even surpass many aspects of it.I will work hard to reach feature parity with
git2
for starters and then do my best to make it easier for users to choosegix
overgit2
when starting new projects.Q & A
Q: Can I use
gix archive
instead ofgit archive
?In think it's worth giving it a short if you need the extra performance or the extra capabilities. Be aware of the current shortcomings though, and maybe even contribute a fix.
Q: Why does
gix archive
exist?As
gix
is a development tool to be able to run thegix
crate in the real world, it made sense to be able to test the worktree related code in a context that doesn't involve writing files to disk. Implementing this means we need to be flexible enough to be able to put all related parts together in different ways, andgix archive
is a very nice application of said 'worktree machinery'.As it turns out, many folks wanted it to support submodules as well, and with this work ongoing it seems similarly useful to validate the
gix
API against such a need - supporting submodules should be reasonably easy with anythinggix
comes up with.Q: Can I use this in my own crate?
Yes, there is
gix-archive
that implements archiving, andgix-worktree-stream
that provides a stream of entries that would make up the worktree on disk, bit for bit.Q: Could the same be implemented with
git2
?Definitely, and I'd be keen to see such an implementation in comparison to
git
andgix
!Data
Versions used initially.
Then after updates to how
gz
compression works, it's this one:archive -f tar
archive -f tar-gz
with
libflate
with
flate2
with
flate2
level 9with
flate2
level 1archive -f zip
File Sizes
tar.gz
libflate
tar.gz
flate2
tar.gz
flate2
with compression level 9tar.gz
flate2
with compression level 1Memory
with
libflate
with
flate2
Beta Was this translation helpful? Give feedback.
All reactions