-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Problem
hej! 😺
When debugging unreproducible Rust binaries it's common12 to diff the build directory contents with something like:
diffoscope --html diff.html --exclude-directory-metadata=yes target.1 target.2
This can often give you some strong signals which crate is causing problems and needs investigation.
While investigating cargo-audit I had a lot of files from .fingerprint/ in my diff. With cargo's code as-of today, if you delete a dep-info file and regenerate, you are not guaranteed to get the same dep-info file you had before. They are "logically identical" but not bit-for-bit identical.
Proposed Solution
I submitted a fix in #16691 and then after fixing cargo-audits root cause I noticed there's only the .rustc_info.json file left in my diff, which I also managed to make deterministic.
I now have two patches, that would make the target/ folder fully deterministic. This means, if you run cargo build, then rm -r target/, and run cargo build again, the build directory would be in a bit-for-bit identical state.
In both cases the problem is a HashMap being iterated, which happens in an undefined order. Changing this to a BTreeMap makes the files in target/ output stable, meaning "same input, same output".
BTreeMaps are O(log n) while HashMaps are O(1), however BTreeMaps are often faster for small data (e.g. due to cache locality). Since the structs in question typically hold data in the range of 3-20 entries instead of 500k+, I suspect this change would be either net-neutral or even net-positive for Rust compile speed (however I don't have the means to measure this).
Notes
No response
Footnotes
-
This is also how https://github.com/gtk-rs/gtk-rs-core/pull/1840 was tracked down during Reproducible Builds Summit 2025 in an openSUSE/Arch Linux joint effort ↩