Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add unsafe blake3 benchmark for merkledb #3350

Draft
wants to merge 1 commit into
base: add-blake3-benchmark
Choose a base branch
from

Conversation

StephenButtolph
Copy link
Contributor

@StephenButtolph StephenButtolph commented Aug 29, 2024

Why this should be merged

Built on top of the safe blake3 benchmark: #3349.

This PR isn't safe, as the hasher interface expects to be able to be used concurrently. However, it shows the potential performance of BLAKE3 against SHA256.

goos: darwin
goarch: arm64
pkg: github.com/ava-labs/avalanchego/x/merkledb
Benchmark_SHA256_HashNode/empty_node-12         	14426436	        79.50 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_value-12          	14245774	        84.81 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_key-12            	16405936	        72.80 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/1_child-12            	 8694835	       138.8 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/2_children-12         	 5892787	       201.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/16_children-12        	 1000000	      1052 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/empty_node-12         	 9002222	       131.4 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_value-12          	 8012547	       149.9 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_key-12            	 8915469	       136.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/1_child-12            	 6148550	       195.5 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/2_children-12         	 3697560	       324.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/16_children-12        	  751521	      1606 ns/op	       0 B/op	       0 allocs/op

^ on my macbook pro (apple M2 Max chip)

How this works

Resetting the BLAKE3 internal state is significantly faster than initializing an entirely new hasher.

How this was tested

  • Existing unit tests + benchmarks

@StephenButtolph StephenButtolph added the DO NOT MERGE This PR must not be merged in its current state label Aug 29, 2024
@StephenButtolph
Copy link
Contributor Author

StephenButtolph commented Sep 15, 2024

Running on a new Ubuntu AWS instance using c7i.large (amd64) gives:

goos: linux
goarch: amd64
pkg: github.com/ava-labs/avalanchego/x/merkledb
cpu: Intel(R) Xeon(R) Platinum 8488C
Benchmark_SHA256_HashNode/empty_node-2         	11140938	       107.4 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_value-2          	10032351	       119.9 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_key-2            	10851236	       115.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/1_child-2            	 6390860	       182.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/2_children-2         	 4471764	       267.4 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/16_children-2        	  891460	      1319 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/empty_node-2         	11008384	       108.9 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_value-2          	10042704	       120.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_key-2            	10968001	       112.7 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/1_child-2            	 6379234	       183.3 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/2_children-2         	 4328890	       279.5 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/16_children-2        	  798664	      1484 ns/op	       0 B/op	       0 allocs/op

Which is much more similar between the two implementations (SHA256 still being slightly faster)

The performance is similar when running on c7a.large.

@StephenButtolph
Copy link
Contributor Author

Running on a new Ubuntu AWS instance using c7g.large (arm64) gives:

goos: linux
goarch: arm64
pkg: github.com/ava-labs/avalanchego/x/merkledb
Benchmark_SHA256_HashNode/empty_node-2         	 8024362	       149.3 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_value-2          	 6909513	       173.0 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_key-2            	 8041004	       149.4 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/1_child-2            	 4334109	       276.4 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/2_children-2         	 3111552	       385.4 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/16_children-2        	  656708	      1861 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/empty_node-2         	 5403916	       222.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_value-2          	 4870874	       246.3 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_key-2            	 5143668	       233.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/1_child-2            	 3152232	       381.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/2_children-2         	 2070284	       579.5 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/16_children-2        	  426556	      2795 ns/op	       0 B/op	       0 allocs/op

@StephenButtolph
Copy link
Contributor Author

Running on a new Ubuntu AWS instance using c6i.large (amd64) gives:

goos: linux
goarch: amd64
pkg: github.com/ava-labs/avalanchego/x/merkledb
cpu: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
Benchmark_SHA256_HashNode/empty_node-2         	 9677665	       124.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_value-2          	 8577712	       140.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_key-2            	 9545270	       125.9 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/1_child-2            	 5190390	       232.5 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/2_children-2         	 3504306	       342.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/16_children-2        	  714806	      1661 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/empty_node-2         	10049533	       119.8 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_value-2          	 8944796	       136.6 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_key-2            	 9560641	       126.3 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/1_child-2            	 5372684	       223.5 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/2_children-2         	 3504469	       342.7 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/16_children-2        	  618915	      1723 ns/op	       0 B/op	       0 allocs/op

Interestingly - for the smallest nodes, BLAKE3 actually outperforms here (yet loses against the "bigger" sizes - which are still very small)

@StephenButtolph
Copy link
Contributor Author

Running on a new Ubuntu AWS instance using c6g.large (arm64) gives:

goos: linux
goarch: arm64
pkg: github.com/ava-labs/avalanchego/x/merkledb
Benchmark_SHA256_HashNode/empty_node-2         	 6432463	       186.9 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_value-2          	 5515927	       217.4 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_key-2            	 6405484	       187.4 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/1_child-2            	 3292816	       362.7 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/2_children-2         	 2384853	       502.8 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/16_children-2        	  450978	      2732 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/empty_node-2         	 3864762	       310.5 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_value-2          	 3517705	       341.3 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_key-2            	 3777644	       317.6 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/1_child-2            	 2455851	       488.4 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/2_children-2         	 1535839	       781.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/16_children-2        	  302064	      3967 ns/op	       0 B/op	       0 allocs/op

@StephenButtolph
Copy link
Contributor Author

Running on a new Ubuntu AWS instance using c5.large (amd64) gives:

goos: linux
goarch: amd64
pkg: github.com/ava-labs/avalanchego/x/merkledb
cpu: Intel(R) Xeon(R) Platinum 8275CL CPU @ 3.00GHz
Benchmark_SHA256_HashNode/empty_node-2         	 4593003	       260.8 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_value-2          	 4272609	       280.6 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_key-2            	 4595166	       266.6 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/1_child-2            	 3111033	       385.6 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/2_children-2         	 1916672	       625.5 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/16_children-2        	  392150	      3029 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/empty_node-2         	 9074803	       132.3 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_value-2          	 7937150	       151.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_key-2            	 8880096	       135.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/1_child-2            	 4736965	       252.8 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/2_children-2         	 3229576	       371.5 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/16_children-2        	  630412	      1910 ns/op	       0 B/op	       0 allocs/op

This is the first instance that shows a significant performance improvement when using BLAKE3. Overall still slower than the c7i.large series though.

@StephenButtolph
Copy link
Contributor Author

Running on a new Ubuntu AWS instance using c4.large (amd64) gives:

goos: linux
goarch: amd64
pkg: github.com/ava-labs/avalanchego/x/merkledb
cpu: Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz
Benchmark_SHA256_HashNode/empty_node-2         	 4565466	       262.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_value-2          	 4233136	       282.7 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/has_key-2            	 4548741	       263.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/1_child-2            	 3088939	       388.7 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/2_children-2         	 1899824	       630.9 ns/op	       0 B/op	       0 allocs/op
Benchmark_SHA256_HashNode/16_children-2        	  393420	      3046 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/empty_node-2         	 9008563	       133.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_value-2          	 7875678	       152.2 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/has_key-2            	 8815430	       136.1 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/1_child-2            	 4703169	       255.6 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/2_children-2         	 3191654	       375.9 ns/op	       0 B/op	       0 allocs/op
Benchmark_BLAKE3_HashNode/16_children-2        	  613514	      1925 ns/op	       0 B/op	       0 allocs/op

Again BLAKE3 seems to outperform on these older chips.

Copy link

This PR has become stale because it has been open for 30 days with no activity. Adding the lifecycle/frozen label will cause this PR to ignore lifecycle events.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DO NOT MERGE This PR must not be merged in its current state lifecycle/stale
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant