Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize diagonalize_measurements Transform for Enhanced Performance #6742

Open
wants to merge 40 commits into
base: master
Choose a base branch
from

Conversation

JakeKitchen
Copy link
Contributor

@JakeKitchen JakeKitchen commented Dec 29, 2024

Context:

The original implementation had some inefficiencies, especially with large circuits and multiple measurements.


Change:

  1. Use frozenset:

    • Converted supported_base_obs and related observables to frozenset for faster lookups and immutability.
  2. Optimize Membership Checks:

    • Replaced multiple set operations with direct frozenset operations to improve performance.
  3. Precompute Measurements:

    • Extracted Pauli measurements in a single step to avoid redundant iterations.
  4. Streamline Functions:

    • Simplified _check_if_diagonalizing and observable handling functions to reduce computational overhead.
  5. Minimize Copy Operations:

    • Limited the use of the copy operations to necessary instances only.

Benefits:

  • Faster Execution: Improved lookup times and reduced redundant operations enhance overall performance.
  • Better Readability: Cleaner code structure makes it easier to understand and maintain.
  • Lower Memory Usage: Reduced unnecessary list creations and copies decrease memory consumption.
  • Enhanced Scalability: Optimizations support larger and more complex quantum circuits efficiently.

Possible Drawbacks:

None identified.


Related GitHub Issues:

N/A

Copy link

codecov bot commented Dec 29, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 99.60%. Comparing base (2a766d4) to head (04a6db0).

Additional details and impacted files
@@           Coverage Diff           @@
##           master    #6742   +/-   ##
=======================================
  Coverage   99.60%   99.60%           
=======================================
  Files         476      476           
  Lines       45232    45231    -1     
=======================================
- Hits        45055    45054    -1     
  Misses        177      177           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@JakeKitchen JakeKitchen changed the title Update diagonalize_measurements.py Optimize diagonalize_measurements Transform for Enhanced Performance Dec 29, 2024
@albi3ro
Copy link
Contributor

albi3ro commented Dec 30, 2024

Hi @JakeKitchen . Thanks for this PR.

What types of workflows were this performance bottleneck showing up on? Do you have any profiling and timing data?

@JakeKitchen
Copy link
Contributor Author

Hi @JakeKitchen . Thanks for this PR.

What types of workflows were this performance bottleneck showing up on? Do you have any profiling and timing data?

It wasn't bottlenecks necessarily its just general improvements for about 10% speed improvements for when diagonalize measurements gets called by precomputing as much as possible

@JakeKitchen
Copy link
Contributor Author

JakeKitchen commented Dec 30, 2024

n_qubits | Original (ms) | New Implementation (ms) | Speedup
------------------------------------------------------------
       2 |        0.821 |               0.811 |    1.01x
       4 |        1.282 |               1.245 |    1.03x
       8 |        2.300 |               2.210 |    1.04x
      16 |       28.791 |              27.980 |    1.03x
n_qubits | Original (ms) | New Implementation (ms) | Speedup
------------------------------------------------------------
       2 |        0.818 |               0.810 |    1.01x
       4 |        1.490 |               1.460 |    1.02x
       8 |        2.477 |               2.360 |    1.05x
      16 |       29.421 |              28.900 |    1.02x

Here is some benchmarks the Simple Circuit was preforming Hadamard's followed by Pauli-X expectation value measurements on all the qubits and the Complex circuit was mixed measurements that are qubitwise commuting (interleaved Pauli-X and Pauli-Z measurements) each ran through 100 iterations

@albi3ro
Copy link
Contributor

albi3ro commented Dec 30, 2024

n_qubits | Original (ms) | New Implementation (ms) | Speedup
------------------------------------------------------------
       2 |        0.821 |               0.811 |    1.01x
       4 |        1.282 |               1.245 |    1.03x
       8 |        2.300 |               2.210 |    1.04x
      16 |       28.791 |              27.980 |    1.03x
n_qubits | Original (ms) | New Implementation (ms) | Speedup
------------------------------------------------------------
      2 |        0.818 |               0.810 |    1.01x
      4 |        1.490 |               1.460 |    1.02x
      8 |        2.477 |               2.360 |    1.05x
     16 |       29.421 |              28.900 |    1.02x

Here is some benchmarks the Simple Circuit was preforming Hadamard's followed by Pauli-X expectation value measurements on all the qubits and the Complex circuit was mixed measurements that are qubitwise commuting (interleaved Pauli-X and Pauli-Z measurements) each ran through 100 iterations

Thanks for these. Mind including the code you benchmarked as well?

@lillian542 lillian542 self-requested a review December 30, 2024 18:08
Copy link
Contributor

@lillian542 lillian542 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into tidying this up! The docstrings and set conversions look great, I just left a couple of small suggestions.

I have one concern with the implementation, specifically the change to the check that decides whether to attempt the pauli_rep based diagonalization method, or to proceed directly to the less efficient (but broader) backup implementation. I added more details at the relevant line.

pennylane/transforms/diagonalize_measurements.py Outdated Show resolved Hide resolved
pennylane/transforms/diagonalize_measurements.py Outdated Show resolved Hide resolved
pennylane/transforms/diagonalize_measurements.py Outdated Show resolved Hide resolved
JakeKitchen and others added 4 commits December 30, 2024 15:00
Co-authored-by: lillian542 <38584660+lillian542@users.noreply.github.com>
Co-authored-by: lillian542 <38584660+lillian542@users.noreply.github.com>
Co-authored-by: lillian542 <38584660+lillian542@users.noreply.github.com>
@JakeKitchen
Copy link
Contributor Author

JakeKitchen commented Dec 30, 2024

n_qubits | Original (ms) | New Implementation (ms) | Speedup
------------------------------------------------------------
       2 |        0.821 |               0.811 |    1.01x
       4 |        1.282 |               1.245 |    1.03x
       8 |        2.300 |               2.210 |    1.04x
      16 |       28.791 |              27.980 |    1.03x
n_qubits | Original (ms) | New Implementation (ms) | Speedup
------------------------------------------------------------
      2 |        0.818 |               0.810 |    1.01x
      4 |        1.490 |               1.460 |    1.02x
      8 |        2.477 |               2.360 |    1.05x
     16 |       29.421 |              28.900 |    1.02x

Here is some benchmarks the Simple Circuit was preforming Hadamard's followed by Pauli-X expectation value measurements on all the qubits and the Complex circuit was mixed measurements that are qubitwise commuting (interleaved Pauli-X and Pauli-Z measurements) each ran through 100 iterations

Thanks for these. Mind including the code you benchmarked as well?

import pennylane as qml
import timeit
import numpy as np
from updated_diagonalize import diagonalize_measurements as new_diagonalize
from pennylane.transforms import diagonalize_measurements as original_diagonalize

def create_simple_circuit(n_qubits):
    dev = qml.device("default.qubit", wires=n_qubits)
    
    @qml.qnode(dev)
    def circuit():
        for i in range(n_qubits):
            qml.Hadamard(wires=i)
        return [qml.expval(qml.X(i)) for i in range(n_qubits)]
    
    return circuit

def create_complex_circuit(n_qubits):
    dev = qml.device("default.qubit", wires=n_qubits)
    
    @qml.qnode(dev)
    def circuit():
        for i in range(n_qubits):
            qml.Hadamard(wires=i)
            qml.RX(0.5, wires=i)
        
        measurements = []
        for i in range(0, n_qubits-1, 2):
            measurements.append(qml.expval(qml.X(i)))
            measurements.append(qml.expval(qml.Z(i+1)))
        
        if n_qubits % 2:
            measurements.append(qml.expval(qml.X(n_qubits-1)))
            
        return measurements
    
    return circuit

def benchmark_implementation(circuit, transform_fn, n_runs=100):
    transformed_circuit = transform_fn(circuit)
    
    start_time = timeit.default_timer()
    for _ in range(n_runs):
        transformed_circuit()
    end_time = timeit.default_timer()
    
    return (end_time - start_time) / n_runs

def run_benchmarks():
    print("Running benchmarks...")
    print("\nSimple Circuit Benchmarks:")
    print("n_qubits | Original (ms) | New Implementation (ms) | Speedup")
    print("-" * 60)
    
    for n_qubits in [2, 4, 8, 16]:
        circuit = create_simple_circuit(n_qubits)

        original_time = benchmark_implementation(circuit, original_diagonalize) * 1000
        
        new_time = benchmark_implementation(circuit, new_diagonalize) * 1000
        
        speedup = original_time / new_time if new_time > 0 else float('inf')
        
        print(f"{n_qubits:8d} | {original_time:12.3f} | {new_time:19.3f} | {speedup:7.2f}x")
    
    print("\nComplex Circuit Benchmarks:")
    print("n_qubits | Original (ms) | New Implementation (ms) | Speedup")
    print("-" * 60)
    
    for n_qubits in [2, 4, 8, 16]:
        circuit = create_complex_circuit(n_qubits)
        
        original_time = benchmark_implementation(circuit, original_diagonalize) * 1000

        new_time = benchmark_implementation(circuit, new_diagonalize) * 1000
        
        speedup = original_time / new_time if new_time > 0 else float('inf')
        
        print(f"{n_qubits:8d} | {original_time:12.3f} | {new_time:19.3f} | {speedup:7.2f}x")

if __name__ == "__main__":
    run_benchmarks() 

pretty rudimentary benchmark but gets the job done

@lillian542 lillian542 requested a review from albi3ro January 2, 2025 22:01
Copy link
Contributor

@lillian542 lillian542 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates @JakeKitchen! I've left a couple of comments, and requested a second reviewer from the team (all PRs need two approvals).

"""Test that _diagonalize_all_pauli_obs is only used when ALL observables have pauli_rep,
not just when ANY observables have pauli_rep. This test would fail if we used the condition
(pauli_measurements and diagonalize_all) which only checks if ANY observables have pauli_rep.
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering what additional insight this test provides compared to the one above? The earlier one already covers the case where some, but not all, of the observables have a pauli_rep.

Additionally it seems like, for all but the first parametrization, whether or not the observables have a pauli_rep might not matter in this test, since the diagonalize_all condition will be False. So, in that case, not incompatible_measurements and diagonalize_all would end up being False regardless of the value for incompatible_measurements. I'm curious if I'm missing something here!

Copy link
Contributor Author

@JakeKitchen JakeKitchen Jan 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's handling a edge case that verifies that the diagonalization behavior is determined by the supported bases, NOT by the pauli_rep property. Even if all observables had pauli_rep, the diagonalization would still follow the supported_base_obs parameter.

Copy link
Contributor

@lillian542 lillian542 Jan 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you update the docstring (and maybe the test name) if that is the intention? Right now it describes testing the behaviour determined by the presence/absence of a pauli_rep.

If that's what you want to test, you will also need to update the circuit so all the observables have a pauli_rep in order to isolate the execution of logic supported bases. Right now the first parametrization uses the fallback method because of the Hadamard, and everything else uses it both because of the Hadamard and the supported gates, so this test doesn't really isolate either part of the 'decision tree'.

Comment on lines +216 to 219
diagonalizing_gates, diagonal_measurements = rotations_and_diagonal_measurements(tape)
new_measurements = []

diagonalizing_gates, diagonal_measurements = rotations_and_diagonal_measurements(tape)
for m in diagonal_measurements:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this change doing? It looks like we are just switching the order of lines. Am I missing something?

Copy link
Contributor Author

@JakeKitchen JakeKitchen Jan 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just switched this because it looks more clear imo

@astralcai
Copy link
Contributor

pretty rudimentary benchmark but gets the job done

Based on the benchmarking script, we see that the thing being benchmarked is the execution time of the transformed circuit. The runtime of the transform itself isn't actually timed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants