-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add Buffer::from_bitwise_unary and Buffer::from_bitwise_binary me…
#8854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3c68505 to
69e68a1
Compare
…thods, deprecate old methods
69e68a1 to
d5a3604
Compare
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
🤖 |
|
🤖: Benchmark completed Details
|
|
The benchmarks show a slowdown for some operations for some reason
However, given the duration of the benchmark, I am thinking maybe this is cache lines or something. I have an idea of how to improve the benchmarks so they are less noisy (basically run them in a 100x loop) |
| let rem = op(left_chunks.remainder_bits(), right_chunks.remainder_bits()); | ||
| // we are counting its starting from the least significant bit, to to_le_bytes should be correct | ||
| let rem = &rem.to_le_bytes()[0..remainder_bytes]; | ||
| buffer.extend_from_slice(rem); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This might do an extra allocation? Other places avoid this by preallocating the final u64 needed for the remainder as well (collect_bool)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is a good call -- I will make the change
However, this is same code as how the current bitwise_binary_op does it, so I would expect no performance difference 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, however allocations during benchmarking seems to make benchmarking very noisy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 I tried this
pub fn from_bitwise_binary_op<F>(
left: impl AsRef<[u8]>,
left_offset_in_bits: usize,
right: impl AsRef<[u8]>,
right_offset_in_bits: usize,
len_in_bits: usize,
mut op: F,
) -> Buffer
where
F: FnMut(u64, u64) -> u64,
{
let left_chunks = BitChunks::new(left.as_ref(), left_offset_in_bits, len_in_bits);
let right_chunks = BitChunks::new(right.as_ref(), right_offset_in_bits, len_in_bits);
let remainder_bytes = ceil(left_chunks.remainder_len(), 8);
// if it evenly divides into u64 chunks
let buffer = if remainder_bytes == 0 {
let chunks = left_chunks
.iter()
.zip(right_chunks.iter())
.map(|(left, right)| op(left, right));
// Soundness: `BitChunks` is a `BitChunks` iterator which
// correctly reports its upper bound
unsafe { MutableBuffer::from_trusted_len_iter(chunks) }
} else {
// Compute last u64 here so that we can reserve exact capacity
let rem = op(left_chunks.remainder_bits(), right_chunks.remainder_bits());
let chunks = left_chunks
.iter()
.zip(right_chunks.iter())
.map(|(left, right)| op(left, right))
.chain(std::iter::once(rem));
// Soundness: `BitChunks` is a `BitChunks` iterator which
// correctly reports its upper bound, and so is the `chain` iterator
let mut buffer = unsafe { MutableBuffer::from_trusted_len_iter(chunks) };
// Adjust the length down if last u64 is not fully used
let extra_bytes = 8 - remainder_bytes;
buffer.truncate(buffer.len() - extra_bytes);
buffer
};
buffer.into()
}But it seems to be slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tried making a version of MutableBuffer::from_trusted_len_iter that also added additional and it didn't seem to help either (perhaps because the benchmarks happen to avoid reallocation 🤔 )
/// Like [`from_trusted_len_iter`] but can add additional capacity at the end
/// in case the caller wants to add more data after the initial iterator.
#[inline]
pub unsafe fn from_trusted_len_iter_with_additional_capacity<T: ArrowNativeType, I: Iterator<Item = T>>(
iterator: I,
additional_capacity: usize,
) -> Self {
let item_size = std::mem::size_of::<T>();
let (_, upper) = iterator.size_hint();
let upper = upper.expect("from_trusted_len_iter requires an upper limit");
let len = upper * item_size;
let mut buffer = MutableBuffer::new(len + additional_capacity);
let mut dst = buffer.data.as_ptr();
for item in iterator {
// note how there is no reserve here (compared with `extend_from_iter`)
let src = item.to_byte_slice().as_ptr();
unsafe { std::ptr::copy_nonoverlapping(src, dst, item_size) };
dst = unsafe { dst.add(item_size) };
}
assert_eq!(
unsafe { dst.offset_from(buffer.data.as_ptr()) } as usize,
len,
"Trusted iterator length was not accurately reported"
);
buffer.len = len;
buffer
}There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is also a extend from trusted len iter in MutableBuffer? Other option is to use Vec::extend here as well.
| F: FnMut(u64) -> u64, | ||
| { | ||
| // reserve capacity and set length so we can get a typed view of u64 chunks | ||
| let mut result = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As we overwrite the results, we shouldn't need to initialize/zero out the array.
Might also because of the allocation? Looks like |
…thods, deprecate old methods
Which issue does this PR close?
Rationale for this change
bitwise_bin_op_helper and bitwise_unary_op_helper are somewhat hard to find and use
as explained on WIP: special case bitwise ops when buffers are u64 aligned #8807
I want to optimize bitwise operations even more heavily (see WIP: special case bitwise ops when buffers are u64 aligned #8807) so I want the implementations centralized so I can focus the efforts there
Also, I think these APIs I think cover the usecase explained by @jorstmann on #8561:
By creating a method on Buffer directly, it is easier to find, and it is clearer that
a new Buffer is being created.
What changes are included in this PR?
Changes:
Buffer::from_bitwise_unaryandBuffer::from_bitwise_binarymethods that do the same thing asbitwise_unary_op_helperandbitwise_bin_op_helperbut are easier to find and usebitwise_unary_op_helperandbitwise_bin_op_helperin favorof the new Buffer methods
operate on bits, not bytes and shouldn't do any cross byte operations)
Are these changes tested?
Yes, new doc tests
Are there any user-facing changes?
New APIs, some deprecated