-
Notifications
You must be signed in to change notification settings - Fork 9
Add binarySearch() function to Array, VarArray, and List
#375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
No description provided. |
Benchmark Resultsbench/ArrayBuilding.bench.mo
|
| 1000 | 100000 | 1000000 | |
|---|---|---|---|
| List | 548_233 |
48_324_535 |
478_161_875 |
| Buffer | 342_005 |
33_903_435 |
339_003_650 |
| pure/List | 302_135 |
30_003_590 |
300_055_972 |
| VarArray ?T | 180_526 |
17_802_956 |
178_003_171 |
| VarArray T | 160_813 |
15_803_243 |
158_003_458 |
| Array (baseline) | 42_695 |
4_003_125 |
40_003_340 |
Heap
| 1000 | 100000 | 1000000 | |
|---|---|---|---|
| List | 272 B |
272 B |
272 B |
| Buffer | 272 B |
272 B |
272 B |
| pure/List | 272 B |
272 B |
272 B |
| VarArray ?T | 272 B |
272 B |
272 B |
| VarArray T | 272 B |
272 B |
272 B |
| Array (baseline) | 272 B |
272 B |
272 B |
Garbage Collection
| 1000 | 100000 | 1000000 | |
|---|---|---|---|
| List | 10.05 KiB |
797.56 KiB |
7.67 MiB |
| Buffer | 8.71 KiB |
782.15 KiB |
7.63 MiB |
| pure/List | 19.95 KiB |
1.91 MiB |
19.07 MiB |
| VarArray ?T | 8.24 KiB |
781.68 KiB |
7.63 MiB |
| VarArray T | 8.23 KiB |
781.67 KiB |
7.63 MiB |
| Array (baseline) | 4.3 KiB |
391.02 KiB |
3.82 MiB |
bench/FromIters.bench.mo $({\color{gray}0\%})$
Benchmarking the fromIter functions
Columns describe the number of elements in the input iter.
Instructions:
Heap:
Stable Memory:
Garbage Collection:
Instructions
| 100 | 10_000 | 100_000 | |
|---|---|---|---|
| Array.fromIter | 48_764 |
4_712_025 |
47_103_135 |
| List.fromIter | 31_698 |
3_061_541 |
30_603_553 |
| List.fromIter . Iter.reverse | 50_297 |
4_832_563 |
48_305_477 |
Heap
| 100 | 10_000 | 100_000 | |
|---|---|---|---|
| Array.fromIter | 272 B |
272 B |
272 B |
| List.fromIter | 272 B |
272 B |
272 B |
| List.fromIter . Iter.reverse | 272 B |
272 B |
272 B |
Garbage Collection
| 100 | 10_000 | 100_000 | |
|---|---|---|---|
| Array.fromIter | 2.76 KiB |
234.79 KiB |
2.29 MiB |
| List.fromIter | 3.51 KiB |
312.88 KiB |
3.05 MiB |
| List.fromIter . Iter.reverse | 5.11 KiB |
469.17 KiB |
4.58 MiB |
bench/ListBufferNewArray.bench.mo $({\color{gray}0\%})$
List vs. Buffer for creating known-size arrays
Performance comparison between List and Buffer for creating a new array.
Instructions:
Heap:
Stable Memory:
Garbage Collection:
Instructions
| 0 (baseline) | 1 | 5 | 10 | 100 (for loop) | |
|---|---|---|---|---|---|
| List | 1_547 |
2_916 |
9_046 |
13_948 |
74_564 |
| pure/List | 1_247 |
1_355 |
2_439 |
3_801 |
31_868 |
| Buffer | 2_119 |
2_271 |
3_518 |
5_085 |
36_640 |
Heap
| 0 (baseline) | 1 | 5 | 10 | 100 (for loop) | |
|---|---|---|---|---|---|
| List | 272 B |
272 B |
272 B |
272 B |
272 B |
| pure/List | 272 B |
272 B |
272 B |
272 B |
272 B |
| Buffer | 272 B |
272 B |
272 B |
272 B |
272 B |
Garbage Collection
| 0 (baseline) | 1 | 5 | 10 | 100 (for loop) | |
|---|---|---|---|---|---|
| List | 576 B |
616 B |
776 B |
884 B |
1.93 KiB |
| pure/List | 360 B |
380 B |
460 B |
560 B |
2.3 KiB |
| Buffer | 856 B |
864 B |
896 B |
936 B |
1.62 KiB |
bench/PureListStackSafety.bench.mo $({\color{gray}0\%})$
List Stack safety
Check stack-safety of the following pure/List-related functions.
Instructions:
Heap:
Stable Memory:
Garbage Collection:
Instructions
| pure/List.split | 24_602_524 |
| pure/List.all | 7_901_014 |
| pure/List.any | 8_001_390 |
| pure/List.map | 23_103_767 |
| pure/List.filter | 21_104_188 |
| pure/List.filterMap | 27_404_742 |
| pure/List.partition | 21_304_994 |
| pure/List.join | 33_105_326 |
| pure/List.flatten | 24_805_667 |
| pure/List.take | 24_605_664 |
| pure/List.drop | 9_904_119 |
| pure/List.foldRight | 19_105_768 |
| pure/List.merge | 31_808_584 |
| pure/List.chunks | 51_510_344 |
| pure/Queue | 142_662_505 |
Heap
| pure/List.split | 272 B |
| pure/List.all | 272 B |
| pure/List.any | 272 B |
| pure/List.map | 272 B |
| pure/List.filter | 272 B |
| pure/List.filterMap | 272 B |
| pure/List.partition | 272 B |
| pure/List.join | 272 B |
| pure/List.flatten | 272 B |
| pure/List.take | 272 B |
| pure/List.drop | 272 B |
| pure/List.foldRight | 272 B |
| pure/List.merge | 272 B |
| pure/List.chunks | 272 B |
| pure/Queue | 272 B |
Garbage Collection
| pure/List.split | 3.05 MiB |
| pure/List.all | 328 B |
| pure/List.any | 328 B |
| pure/List.map | 3.05 MiB |
| pure/List.filter | 3.05 MiB |
| pure/List.filterMap | 3.05 MiB |
| pure/List.partition | 3.05 MiB |
| pure/List.join | 3.05 MiB |
| pure/List.flatten | 3.05 MiB |
| pure/List.take | 3.05 MiB |
| pure/List.drop | 328 B |
| pure/List.foldRight | 1.53 MiB |
| pure/List.merge | 4.58 MiB |
| pure/List.chunks | 7.63 MiB |
| pure/Queue | 18.31 MiB |
bench/Queues.bench.mo $({\color{gray}0\%})$
Different queue implementations
Compare the performance of the following queue implementations:
-
pure/Queue: The default immutable double-ended queue implementation.- Pros: Good amortized performance, meaning that the average cost of operations is low
O(1). - Cons: In worst case, an operation can take
O(size)time rebuilding the queue as demonstrated in thePop front 2 elementsscenario.
- Pros: Good amortized performance, meaning that the average cost of operations is low
-
pure/RealTimeQueue- Pros: Every operation is guaranteed to take at most
O(1)time and space. - Cons: Poor amortized performance: Instruction cost is on average 3x for pop and 8x for push compared to
pure/Queue.
- Pros: Every operation is guaranteed to take at most
- mutable
Queue- Pros: Also
O(1)guarantees with a lower constant factor thanpure/RealTimeQueue. Amortized performance is comparable topure/Queue. - Cons: It is mutable and cannot be used in
sharedtypes (not shareable).
- Pros: Also
Instructions:
Heap:
Stable Memory:
Garbage Collection:
Instructions
| pure/Queue | pure/RealTimeQueue | mutable Queue | |
|---|---|---|---|
| Initialize with 2 elements | 3_092 |
2_304 |
3_040 |
| Push 500 elements | 90_713 |
744_219 |
219_284 |
| Pop front 2 elements | 86_966 |
4_446 |
3_847 |
| Pop 150 front&back | 92_095 |
304_908 |
124_581 |
Heap
| pure/Queue | pure/RealTimeQueue | mutable Queue | |
|---|---|---|---|
| Initialize with 2 elements | 324 B |
300 B |
352 B |
| Push 500 elements | 8.08 KiB |
8.17 KiB |
19.8 KiB |
| Pop front 2 elements | 240 B |
240 B |
192 B |
| Pop 150 front&back | -4.42 KiB |
-492 B |
-11.45 KiB |
Garbage Collection
| pure/Queue | pure/RealTimeQueue | mutable Queue | |
|---|---|---|---|
| Initialize with 2 elements | 508 B |
444 B |
456 B |
| Push 500 elements | 10.1 KiB |
137.84 KiB |
344 B |
| Pop front 2 elements | 12.19 KiB |
528 B |
424 B |
| Pop 150 front&back | 15.61 KiB |
49.66 KiB |
12.1 KiB |
Note: Renamed benchmarks cannot be compared. Refer to the current baseline for manual comparison.
binarySearch() function to Array and VarArraybinarySearch() function to Array, VarArray, and List
crusso
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
|
I'd maybe add a note about what happens if there are multiple equal elements. I'm assuming we don't make any guarantees which index of those we return? Here's the equivalent Rust documentation for reference: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.binary_search Go's slices.BinarySearch provides similar functionality: https://pkg.go.dev/golang.org/x/exp/slices#BinarySearch |
ggreif
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with the compressed code.
|
Hi all, thanks for implementing this so quickly! I have a couple of thoughts in terms of the API design:
|
Following discussions with @rvanasa and @alexandru-uta, here is a PR following the design in point 2: #381. I also opened an issue for this, hopefully not a duplicate: #382. |
|
Updated the functions to return |
christoph-dfinity
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done!
AStepanov25
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As for me the interface where binarySearch returns result is too complicated. It should return ?Nat, I think. If the user wants to know where the element is approximately located it's worth to introduce lowerBound and upperBound like in C++.
src/Array.mo
Outdated
| /// | ||
| /// *Runtime and space assumes that `compare` runs in O(1) time and space. | ||
| public func binarySearch<T>(array : [T], compare : (T, T) -> Order.Order, element : T) : Types.Result<Nat, Nat> { | ||
| let size = array.size(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it worth to introduce size constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the interface is just dubbing Rust's way of doing this (which I also find a bit complicated interface-wise). Another thing I don't like about this interface is that we don't give any guarantees about which element we return in case there are multiple occurrences (the Rust docs explain this is for efficiency: you can early-stop the first time you find an element with the right value, which is also what the implementations in this PR do). I am personally in favor of C++-esque lower_bound functions, but I am biased towards C++. The current plan/idea is to also have some functions like this: #381. Maybe we can indeed split the functionality further based on how we think users will likely use these functions. Early stopping is actually good in cases where users have "the list has no duplicates" as an invariant in their code (but I can't tell how often this is the case versus lists with potential duplicates). I also can't tell immediately how often the users will be happy with getting an arbitrary index matching the required value versus the first/last one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out; updated to remove the unnecessary size constant.
I see your point about the interface being complicated. Maybe a solution could be to show how to convert to an option in the doc comments, e.g. using Result.toOption(), for those migrating from Buffer.binarySearch().
Alternatively, I suppose we could roll back to the original implementation or even just defer to a Mops package containing these functions along with #381.
|
Why does it make sense to offer binarySearch for a raw Array/List where the user has direct control of the elements and can (inadvertently) corrupt the ordering? Shouldn't this be part of a new data structure that wraps around Array/List and that guarantees correct ordering at all times? Or what's an application example where this is safe? |
Well, you could introduce a Standard (practical) example is to (1) sort a list (2) then perform multiple binary searches on it (~on demand) to retrieve some entry. Here, ofc, one could argue that sorting should return a |
|
The Languages team discussed this today, and we decided to go with an API that is similar to Rust except with |
Nice, thanks for leading this and making a decision on the API! This means we can now decide on the API/name for #381, so anyone can tune in there if they want to. |
Reintroduces similar functionality to
Buffer.binarySearch()based on feedback.