-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: initialize bytes up to len
of AlignedBuf
.
#301
Conversation
Looking more into it. |
I think the way around this would be to zero the data from OTOH it's while reading uninitialized data is UB we don't use the uninitialized data beyond Not initializing the extra data is an optimisation (it removes a few memory writes) |
This doesn't matter. As this talk of Ralf Jung explains. Anything that is created as UB can lead to anything because the compiler is allowed to do so. We must just follow the rules: https://www.youtube.com/watch?v=svR0p6fSUYY Valgrind also confirmed the UB in this case.
Yes, that is what I did now by
In any case, we must initialize memory as now we have UB and there are strange bugs. :') |
@Licenser, I went for 2. |
Happy valgrind run now:
|
zeroing the entire buffer is something we really want to avoid, it's not needed and will really slow down the parsing. Let me explain what happens: When reading the JSON data we read a SIMD full register at a time, (usually 256 bit) for that two things are important:
The way we deal with that is by allocating a new region of memory that is guaranteed to be aligned and while not being exactly a multiple has at least one extra register worth of space. The algorithm, being coming from C-land terminates on a zero-byte, so when allocating the new buffer, we terminate it with that zero-byte. So for example, if the JSON is 40 bytes long, we'd allocate a buffer of say 64 bytes (32 extra bytes to get over the 32 of the register) then write a zero at byte 41 (behind the end of). The option of learning all memory behind 40 (in this example) with 0's seems like a good approach, the prob with the clear all is that we'll effectively cause ~ 2x as many memory writes as needed once for zeroing the entire data, once for then copying over the original. Sorry for the long ramble before I saw your comments :D, I'll leave it in. before we merge this lets see I there is a better option then writing a number of 0 bytes, we know the data is aligned and we know we've 2 SIMD registers with to fill with 0's so perhaps we can reduce the writes larger chunks? ('ve not thrown this in godbolt to see how rust optimizes the writes) |
Right, I understand your issue with The current PR only write the remaining bytes. If the We could also say that we write the minimum of
Sorry, what do you mean? |
len
of AlignedBuf
.
The failing tests seem related to SIMD (unrelated to this PR). |
I checked. It defaults to memset instead of a loop which should optimize to bigger writes (https://rust.godbolt.org/z/edWzh33ss) so 👍 two small optimisations, we don't need the edit: with removing the extra write and if I'm good to merge :) thanks! |
Great, shall remove the branch tomorrow. |
yay! I'll merge and release it then :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thank you :) nice catch, nice fix 👍
released as |
It is UB to create
Vec
with unitialized bytes.This fixes that by ensuring we only
set_len
to the length we have written.Adding a few
dbg!
statements showing thelen
andcapacity
whilst reading this file showed they where off.Reading that file in valgrind also showed that we read the uninitialized bytes confirming UB.
Valgrind output:
Background
I got to this because I have very strange bugs upstream in polars which I cannot replicate and only occur when we compile with fat linking and all optimizations. This made me suspect UB.
pola-rs/polars#9791
pola-rs/polars#10034