Use loole to improve channel performance #228

mahdi-shojaee · 2023-12-05T16:10:46Z

Hello, this pull request replaces Flume with Loole, my recent safe implementation of MPMC channel that has successfully passed all Flume tests for all implemented APIs. Consequently, both channel-related benchmarks in read.rs have demonstrated the following improvements:

# Using flume
test read_single_image_rle_all_channels               ... bench:  14,087,088 ns/iter (+/- 1,028,918)
test read_single_image_rle_non_parallel_all_channels  ... bench:  22,124,520 ns/iter (+/- 148,387)

# Using loole
test read_single_image_rle_all_channels               ... bench:   7,956,153 ns/iter (+/- 748,611)
test read_single_image_rle_non_parallel_all_channels  ... bench:  16,360,325 ns/iter (+/- 77,868)

Loole's GitHub: https://github.com/mahdi-shojaee/loole

Thanks

johannesvollmer · 2023-12-05T20:42:46Z

Very interesting. I'm curious how it's possible to be faster than flume, is there something I can read about how loole achieves this awesome performance? :)

mahdi-shojaee · 2023-12-05T21:15:50Z

Thanks, not yet. Currently, the only source of information is the source code itself. However, to summarize, some potential reasons for the performance difference could include the absence of cloning, minimal abstraction, minimal moves, and possibly a more efficient algorithm (although I haven't reviewed Flume's source code). In the future, it's conceivable that sync operations (send and recv) could be further optimized by enhancing the signal implementation.

johannesvollmer · 2023-12-05T21:18:45Z

Cool thanks :) I'll have a closer look at this tomorrow hopefully. Thanks so far :)

mahdi-shojaee · 2023-12-05T21:22:36Z

You are welcome.

mahdi-shojaee · 2023-12-19T08:29:09Z

Hello, thank you for reviewing my pull request. I appreciate your feedback. However, I have discovered that there is an issue with the benchmark code that causes inconsistent and unreliable results. Even using flume itself, if you run the benchmark repeatedly it shows randomly faster or slower results. This also happens using loole. Thanks again.

mahdi-shojaee · 2023-12-19T08:46:03Z

If this is correct, I think this PR is no longer valid.

johannesvollmer · 2023-12-19T09:21:28Z

Hmm interesting. Sorry, I didn't yet have the time to look into this PR, please accept by apologies. I'd like to run both benchmarks myself and I'll report what happens on my machine

johannesvollmer · 2023-12-19T13:24:02Z

-- loole -- 
test read_f16_as_f16_uncompressed_1thread ... bench:   8,388,010 ns/iter (+/- 3,397,072)
test read_f16_as_f16_zip_1thread          ... bench:  17,489,190 ns/iter (+/- 3,253,988)
test read_f16_as_f16_zip_nthreads         ... bench:   9,099,960 ns/iter (+/- 3,757,732)
test read_f16_as_f32_uncompressed_1thread ... bench:   8,234,400 ns/iter (+/- 3,470,654)
test read_f16_as_f32_zip_1thread          ... bench:  17,849,420 ns/iter (+/- 1,432,790)
test read_f16_as_f32_zip_nthreads         ... bench:   8,874,635 ns/iter (+/- 3,389,003)
test read_f16_as_u32_uncompressed_1thread ... bench:   8,163,710 ns/iter (+/- 2,859,007)
test read_f32_as_f16_uncompressed_1thread ... bench:  29,717,520 ns/iter (+/- 2,255,945)
test read_f32_as_f16_zips_1thread         ... bench:  87,762,410 ns/iter (+/- 2,405,089)
test read_f32_as_f16_zips_nthreads        ... bench:  32,962,650 ns/iter (+/- 2,531,739)
test read_f32_as_f32_uncompressed_1thread ... bench:  29,741,710 ns/iter (+/- 2,462,639)
test read_f32_as_f32_zips_1thread         ... bench:  88,301,830 ns/iter (+/- 1,456,787)
test read_f32_as_f32_zips_nthreads        ... bench:  33,265,970 ns/iter (+/- 1,701,066)
test read_f32_as_u32_uncompressed_1thread ... bench:  29,956,220 ns/iter (+/- 2,818,988)

-- flume --
test read_f16_as_f16_uncompressed_1thread ... bench:   8,104,510 ns/iter (+/- 3,180,752)
test read_f16_as_f16_zip_1thread          ... bench:  17,785,760 ns/iter (+/- 2,046,426)
test read_f16_as_f16_zip_nthreads         ... bench:   9,142,180 ns/iter (+/- 3,819,020)
test read_f16_as_f32_uncompressed_1thread ... bench:   8,028,740 ns/iter (+/- 2,876,322)
test read_f16_as_f32_zip_1thread          ... bench:  17,656,880 ns/iter (+/- 1,436,841)
test read_f16_as_f32_zip_nthreads         ... bench:   8,718,960 ns/iter (+/- 3,392,016)
test read_f16_as_u32_uncompressed_1thread ... bench:   8,115,830 ns/iter (+/- 2,718,979)
test read_f32_as_f16_uncompressed_1thread ... bench:  29,787,260 ns/iter (+/- 2,147,523)
test read_f32_as_f16_zips_1thread         ... bench:  87,121,620 ns/iter (+/- 3,145,823)
test read_f32_as_f16_zips_nthreads        ... bench:  33,109,410 ns/iter (+/- 2,087,108)
test read_f32_as_f32_uncompressed_1thread ... bench:  30,033,080 ns/iter (+/- 40,816,735)
test read_f32_as_f32_zips_1thread         ... bench:  88,689,050 ns/iter (+/- 6,816,531)
test read_f32_as_f32_zips_nthreads        ... bench:  33,697,520 ns/iter (+/- 9,957,318)
test read_f32_as_u32_uncompressed_1thread ... bench:  30,318,000 ns/iter (+/- 4,575,068)

These are the numbers on my machine. Your insights seem plausible, there is no difference.
The concurrency is most likely not the bottleneck, but the pixel data processing on the threads is the bottleneck.

mahdi-shojaee · 2023-12-19T13:37:15Z

That makes sense, thanks for clarifying.

use loole instead of flume

59f0ee7

mahdi-shojaee added 2 commits December 9, 2023 00:59

bump version for loole

f64d898

bump version for loole

e228bf3

mahdi-shojaee closed this Dec 19, 2023

mahdi-shojaee deleted the use-loole branch December 19, 2023 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use loole to improve channel performance #228

Use loole to improve channel performance #228

mahdi-shojaee commented Dec 5, 2023 •

edited

Loading

johannesvollmer commented Dec 5, 2023 •

edited

Loading

mahdi-shojaee commented Dec 5, 2023 •

edited

Loading

johannesvollmer commented Dec 5, 2023

mahdi-shojaee commented Dec 5, 2023

mahdi-shojaee commented Dec 19, 2023

mahdi-shojaee commented Dec 19, 2023

johannesvollmer commented Dec 19, 2023

johannesvollmer commented Dec 19, 2023

mahdi-shojaee commented Dec 19, 2023

Use loole to improve channel performance #228

Use loole to improve channel performance #228

Conversation

mahdi-shojaee commented Dec 5, 2023 • edited Loading

johannesvollmer commented Dec 5, 2023 • edited Loading

mahdi-shojaee commented Dec 5, 2023 • edited Loading

johannesvollmer commented Dec 5, 2023

mahdi-shojaee commented Dec 5, 2023

mahdi-shojaee commented Dec 19, 2023

mahdi-shojaee commented Dec 19, 2023

johannesvollmer commented Dec 19, 2023

johannesvollmer commented Dec 19, 2023

mahdi-shojaee commented Dec 19, 2023

mahdi-shojaee commented Dec 5, 2023 •

edited

Loading

johannesvollmer commented Dec 5, 2023 •

edited

Loading

mahdi-shojaee commented Dec 5, 2023 •

edited

Loading