-
Notifications
You must be signed in to change notification settings - Fork 128
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: remove some TODOs from codebase #376
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #376 +/- ##
==========================================
+ Coverage 85.13% 85.15% +0.02%
==========================================
Files 85 85
Lines 18862 18889 +27
==========================================
+ Hits 16058 16085 +27
Misses 2804 2804 ☔ View full report in Codecov by Sentry. |
let mut acc = F::ZERO; | ||
for (a, b) in a.iter().zip(b.iter()) { | ||
acc += (*a) * (*b); | ||
let n = a.len(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just for curiosity, I made a bench with two other options:
using par_iter
a.into_par_iter().zip_eq(b.into_par_iter())
.map(|(a, b)| *a * *b)
.sum()
using fold_chunk
let num_threads = multicore::current_num_threads();
let chunk_size = (a.len() + num_threads - 1) / num_threads;
a.into_par_iter().zip_eq(b.into_par_iter())
.fold_chunks_with(chunk_size, F::ZERO, |acc, (a,b)| acc + *a * b)
.sum()
Graph is in logaritmic scale.
It seems that till ~100k elements, chunking is better, after is map.sum()
is more performant (does not checked memory, btw)
https://gist.github.com/adria0/80fa9f0e90ca9b83e26ed96db213e54b
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've checked the uses of the function to see what sizes are the most common. Turns out it is only used for IPA and in the PI commitment computation in the verifier side.
Since IPA will be removed we are only left with the last use case, which is in the tens of elements. I don't think we can parallelize much then. I would favor code simplicity here.
@@ -494,13 +494,13 @@ fn test_key_compression() -> Result<(), halo2_proofs::plonk::Error> { | |||
// vk & pk keygen both WITH compression | |||
test_result( | |||
|| test_mycircuit(true, true).expect("should pass"), | |||
"acae50508de5ead584170dd83b139daf40e1026b6debbb78eb05d515173fc2dd", | |||
"44130c6388df3d99263be8da4a280b426dc05f1f315d35d3827347761534bf08", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@guorong009 I see that this happens in 133cbbb.
Is this due serialization changes with ColumnMid
?
Description
Related issues
Changes
compute_inner_product
utilityColumnMid
inQueryBack
halo2_backend/plonk/permutation
EvaluationDomain::extended_to_coeff
function