-
Notifications
You must be signed in to change notification settings - Fork 615
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize diagonalize_measurements
Transform for Enhanced Performance
#6742
base: master
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #6742 +/- ##
=======================================
Coverage 99.60% 99.60%
=======================================
Files 476 476
Lines 45232 45231 -1
=======================================
- Hits 45055 45054 -1
Misses 177 177 ☔ View full report in Codecov by Sentry. |
diagonalize_measurements
Transform for Enhanced Performance
Hi @JakeKitchen . Thanks for this PR. What types of workflows were this performance bottleneck showing up on? Do you have any profiling and timing data? |
It wasn't bottlenecks necessarily its just general improvements for about 10% speed improvements for when diagonalize measurements gets called by precomputing as much as possible |
n_qubits | Original (ms) | New Implementation (ms) | Speedup
------------------------------------------------------------
2 | 0.821 | 0.811 | 1.01x
4 | 1.282 | 1.245 | 1.03x
8 | 2.300 | 2.210 | 1.04x
16 | 28.791 | 27.980 | 1.03x n_qubits | Original (ms) | New Implementation (ms) | Speedup
------------------------------------------------------------
2 | 0.818 | 0.810 | 1.01x
4 | 1.490 | 1.460 | 1.02x
8 | 2.477 | 2.360 | 1.05x
16 | 29.421 | 28.900 | 1.02x Here is some benchmarks the Simple Circuit was preforming Hadamard's followed by Pauli-X expectation value measurements on all the qubits and the Complex circuit was mixed measurements that are qubitwise commuting (interleaved Pauli-X and Pauli-Z measurements) each ran through 100 iterations |
Thanks for these. Mind including the code you benchmarked as well? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for looking into tidying this up! The docstrings and set conversions look great, I just left a couple of small suggestions.
I have one concern with the implementation, specifically the change to the check that decides whether to attempt the pauli_rep
based diagonalization method, or to proceed directly to the less efficient (but broader) backup implementation. I added more details at the relevant line.
Co-authored-by: lillian542 <38584660+lillian542@users.noreply.github.com>
Co-authored-by: lillian542 <38584660+lillian542@users.noreply.github.com>
Co-authored-by: lillian542 <38584660+lillian542@users.noreply.github.com>
import pennylane as qml
import timeit
import numpy as np
from updated_diagonalize import diagonalize_measurements as new_diagonalize
from pennylane.transforms import diagonalize_measurements as original_diagonalize
def create_simple_circuit(n_qubits):
dev = qml.device("default.qubit", wires=n_qubits)
@qml.qnode(dev)
def circuit():
for i in range(n_qubits):
qml.Hadamard(wires=i)
return [qml.expval(qml.X(i)) for i in range(n_qubits)]
return circuit
def create_complex_circuit(n_qubits):
dev = qml.device("default.qubit", wires=n_qubits)
@qml.qnode(dev)
def circuit():
for i in range(n_qubits):
qml.Hadamard(wires=i)
qml.RX(0.5, wires=i)
measurements = []
for i in range(0, n_qubits-1, 2):
measurements.append(qml.expval(qml.X(i)))
measurements.append(qml.expval(qml.Z(i+1)))
if n_qubits % 2:
measurements.append(qml.expval(qml.X(n_qubits-1)))
return measurements
return circuit
def benchmark_implementation(circuit, transform_fn, n_runs=100):
transformed_circuit = transform_fn(circuit)
start_time = timeit.default_timer()
for _ in range(n_runs):
transformed_circuit()
end_time = timeit.default_timer()
return (end_time - start_time) / n_runs
def run_benchmarks():
print("Running benchmarks...")
print("\nSimple Circuit Benchmarks:")
print("n_qubits | Original (ms) | New Implementation (ms) | Speedup")
print("-" * 60)
for n_qubits in [2, 4, 8, 16]:
circuit = create_simple_circuit(n_qubits)
original_time = benchmark_implementation(circuit, original_diagonalize) * 1000
new_time = benchmark_implementation(circuit, new_diagonalize) * 1000
speedup = original_time / new_time if new_time > 0 else float('inf')
print(f"{n_qubits:8d} | {original_time:12.3f} | {new_time:19.3f} | {speedup:7.2f}x")
print("\nComplex Circuit Benchmarks:")
print("n_qubits | Original (ms) | New Implementation (ms) | Speedup")
print("-" * 60)
for n_qubits in [2, 4, 8, 16]:
circuit = create_complex_circuit(n_qubits)
original_time = benchmark_implementation(circuit, original_diagonalize) * 1000
new_time = benchmark_implementation(circuit, new_diagonalize) * 1000
speedup = original_time / new_time if new_time > 0 else float('inf')
print(f"{n_qubits:8d} | {original_time:12.3f} | {new_time:19.3f} | {speedup:7.2f}x")
if __name__ == "__main__":
run_benchmarks() pretty rudimentary benchmark but gets the job done |
looks a little funny because I ran through black linter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @JakeKitchen! I've left a couple of comments, and requested a second reviewer from the team (all PRs need two approvals).
"""Test that _diagonalize_all_pauli_obs is only used when ALL observables have pauli_rep, | ||
not just when ANY observables have pauli_rep. This test would fail if we used the condition | ||
(pauli_measurements and diagonalize_all) which only checks if ANY observables have pauli_rep. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering what additional insight this test provides compared to the one above? The earlier one already covers the case where some, but not all, of the observables have a pauli_rep
.
Additionally it seems like, for all but the first parametrization, whether or not the observables have a pauli_rep
might not matter in this test, since the diagonalize_all
condition will be False. So, in that case, not incompatible_measurements and diagonalize_all
would end up being False regardless of the value for incompatible_measurements
. I'm curious if I'm missing something here!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's handling a edge case that verifies that the diagonalization behavior is determined by the supported bases, NOT by the pauli_rep
property. Even if all observables had pauli_rep
, the diagonalization would still follow the supported_base_obs
parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you update the docstring (and maybe the test name) if that is the intention? Right now it describes testing the behaviour determined by the presence/absence of a pauli_rep
.
If that's what you want to test, you will also need to update the circuit so all the observables have a pauli_rep
in order to isolate the execution of logic supported bases. Right now the first parametrization uses the fallback method because of the Hadamard, and everything else uses it both because of the Hadamard and the supported gates, so this test doesn't really isolate either part of the 'decision tree'.
diagonalizing_gates, diagonal_measurements = rotations_and_diagonal_measurements(tape) | ||
new_measurements = [] | ||
|
||
diagonalizing_gates, diagonal_measurements = rotations_and_diagonal_measurements(tape) | ||
for m in diagonal_measurements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this change doing? It looks like we are just switching the order of lines. Am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just switched this because it looks more clear imo
Co-authored-by: Christina Lee <chrissie.c.l@gmail.com>
Based on the benchmarking script, we see that the thing being benchmarked is the execution time of the transformed circuit. The runtime of the transform itself isn't actually timed. |
Context:
The original implementation had some inefficiencies, especially with large circuits and multiple measurements.
Change:
Use
frozenset
:supported_base_obs
and related observables tofrozenset
for faster lookups and immutability.Optimize Membership Checks:
frozenset
operations to improve performance.Precompute Measurements:
Streamline Functions:
_check_if_diagonalizing
and observable handling functions to reduce computational overhead.Minimize Copy Operations:
copy
operations to necessary instances only.Benefits:
Possible Drawbacks:
None identified.
Related GitHub Issues:
N/A