Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement chunk iterators that drop the GIL #106

Closed
wants to merge 4 commits into from

Conversation

GodTamIt
Copy link
Contributor

@GodTamIt GodTamIt commented Dec 14, 2023

This allows callers to get chunks at a time from the iterator and in such scenarios, it makes sense to drop the GIL.

Dropping the GIL is not only useful for scenarios when there are multiple iterators but when code may be calling other GIL-dropping code.

Benchmarking Rdict Iterator...
Iterator performance: 4194304 items in 1.7615967919118702 seconds (2380967.0971572897 it/s)
Iterator performance multi-thread: 16777216 items in 6.953388374997303 seconds (2412811.5812323666 it/s)

Benchmarking Rdict Batch Iterator...
Batched iterator performance: 16777216 items in 12.566605832893401 seconds (1335063.4390143137 it/s)
Batched iterator performance multi-thread: 16777216 items in 9.088478291872889 seconds (1845987.3546710834 it/s)

@Congyuwang
Copy link
Collaborator

I think it's better to provide a default limited chunk size rather than using None as default--doesn't seem like a default behaviour.

@Congyuwang
Copy link
Collaborator

Iter_chunk still seems significantly slower, although the chunk size is pretty large (25000).

@GodTamIt
Copy link
Contributor Author

Iter_chunk still seems significantly slower, although the chunk size is pretty large (25000).

That's because it's doing 4x more work for less than 4x time

backwards: bool,
py: Python,
) -> PyResult<Vec<(PyObject, PyObject)>> {
let raw_items = py.allow_threads(|| -> PyResult<Vec<(Box<[u8]>, Box<[u8]>)>> {

Check warning

Code scanning / clippy

very complex type used. Consider factoring parts into type definitions Warning

very complex type used. Consider factoring parts into type definitions
@GodTamIt
Copy link
Contributor Author

I think it's better to provide a default limited chunk size rather than using None as default--doesn't seem like a default behaviour.

This is now done.

Iter_chunk still seems significantly slower, although the chunk size is pretty large (25000).

@Congyuwang iter chunk is faster. It's probably more helpful to look at the it/s metric than the total seconds because the number of items used to be different between single-thread and multi-thread. I've updated the benchmark to not do this.

@Congyuwang
Copy link
Collaborator

I'm comparing iter vs. iter_chunk. Not multithreaded vs. single threaded. Seems that based on the previous benchmark, multithreaded iter chunk is slower than multithreaded iter where GIL is not released.

@Congyuwang
Copy link
Collaborator

Looks like we have some kind of deadlock.

@Congyuwang Congyuwang closed this Feb 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants