Skip to content

Commit

Permalink
Add theory, intro and use case pages of LRE user guide (#2522)
Browse files Browse the repository at this point in the history
* draft 1

* vincent + nate feedback

* remove : for warning block

* Apply suggestions from code review

Co-authored-by: nate stemen <nate@unitary.fund>

* nate's feedback round 2

* Apply suggestions from code review

Co-authored-by: nate stemen <nate@unitary.fund>

* clarify theory sections

* gen monomial terms

* Add intro section to LRE docs (#2535)

* add intro and use case pages

Co-Authored-By: Purva Thakre <purva@unitary.fund>

* clean up intro/use case

* clarify depth comment

* wordsmithing

---------

Co-authored-by: Purva Thakre <purva@unitary.fund>

* change wording of Bi matrix

* cleanup first section

* fix l/L typo

---------

Co-authored-by: nate stemen <nate@unitary.fund>
Co-authored-by: Purva Thakre <purva@unitary.fund>
  • Loading branch information
3 people authored Oct 11, 2024
1 parent bc8fdf9 commit 7e4fb92
Show file tree
Hide file tree
Showing 5 changed files with 312 additions and 1 deletion.
3 changes: 2 additions & 1 deletion docs/source/guide/guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,12 @@ core-concepts.md
zne.md
pec.md
cdr.md
shadows.md
ddd.md
lre.md
rem.md
qse.md
pt.md
shadows.md
error-mitigation.md
glossary.md
```
152 changes: 152 additions & 0 deletions docs/source/guide/lre-1-intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,152 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.11.1
kernelspec:
display_name: Python 3
language: python
name: python3
---

# How do I use LRE?

LRE works in two main stages: generate noise-scaled circuits via layerwise scaling, and apply inference to resulting measurements post-execution.

This workflow can be executed by a single call to {func}`.execute_with_lre`.
If more control is needed over the protocol, Mitiq provides {func}`.multivariate_layer_scaling` and {func}`.multivariate_richardson_coefficients` to handle the first and second steps respectively.

```{danger}
LRE is currently compatible with quantum programs written using `cirq`.
Work on making this technique compatible with other frontends is ongoing. 🚧
```

## Problem Setup

To demonstrate the use of LRE, we'll first define a quantum circuit, and a method of executing circuits for demonstration purposes.

For simplicity, we define a circuit whose unitary compiles to the identity operation.
Here we will use a randomized benchmarking circuit on a single qubit, visualized below.

```{code-cell} ipython3
from mitiq import benchmarks
circuit = benchmarks.generate_rb_circuits(n_qubits=1, num_cliffords=3)[0]
print(circuit)
```

We define an [executor](executors.md) which simulates the input circuit subjected to depolarizing noise, and returns the probability of measuring the ground state.
By altering the value for `noise_level`, ideal and noisy expectation values can be obtained.

```{code-cell} ipython3
from cirq import DensityMatrixSimulator, depolarize
def execute(circuit, noise_level=0.025):
noisy_circuit = circuit.with_noise(depolarize(p=noise_level))
rho = DensityMatrixSimulator().simulate(noisy_circuit).final_density_matrix
return rho[0, 0].real
```

Compare the noisy and ideal expectation values:

```{code-cell} ipython3
noisy = execute(circuit)
ideal = execute(circuit, noise_level=0.0)
print(f"Error without mitigation: {abs(ideal - noisy) :.5f}")
```

## Apply LRE directly

With the circuit and executor defined, we just need to choose the polynomial extrapolation degree as well as the fold multiplier.

```{code-cell} ipython3
from mitiq.lre import execute_with_lre
degree = 2
fold_multiplier = 3
mitigated = execute_with_lre(
circuit,
execute,
degree=degree,
fold_multiplier=fold_multiplier,
)
print(f"Error with mitigation (LRE): {abs(ideal - mitigated):.{3}}")
```

As you can see, the technique is extremely simple to apply, and no knowledge of the hardware/simulator noise is required.

## Step by step application of LRE

In this section we demonstrate the use of {func}`.multivariate_layer_scaling` and {func}`.multivariate_richardson_coefficients` for those who might want to inspect the intermediary circuits, and have more control over the protocol.

### Create noise-scaled circuits

We start by creating a number of noise-scaled circuits which we will pass to the executor.

```{code-cell} ipython3
from mitiq.lre import multivariate_layer_scaling
noise_scaled_circuits = multivariate_layer_scaling(circuit, degree, fold_multiplier)
num_scaled_circuits = len(noise_scaled_circuits)
print(f"total number of noise-scaled circuits for LRE = {num_scaled_circuits}")
print(
f"Average circuit depth = {sum(len(circuit) for circuit in noise_scaled_circuits) / num_scaled_circuits}"
)
```

As you can see, the noise scaled circuits are on average much longer than the original circuit.
An example noise-scaled circuit is shown below.

```{code-cell} ipython3
noise_scaled_circuits[3]
```

With the many noise-scaled circuits in hand, we can run them through our executor to obtain the expectation values.

```{code-cell} ipython3
noise_scaled_exp_values = [
execute(circuit) for circuit in noise_scaled_circuits
]
```

### Classical inference

The penultimate step here is to fetch the coefficients we'll use to combine the noisy data we obtained above.
The astute reader will note that we haven't defined or used a `degree` or `fold_multiplier` parameter, and this is where they are both needed.

```{code-cell} ipython3
from mitiq.lre import multivariate_richardson_coefficients
coefficients = multivariate_richardson_coefficients(
circuit,
fold_multiplier=fold_multiplier,
degree=degree,
)
```

Each noise scaled circuit has a coefficient of linear combination and a noisy expectation value associated with it.

### Combine the results

```{code-cell} ipython3
mitigated = sum(
exp_val * coeff
for exp_val, coeff in zip(noise_scaled_exp_values, coefficients)
)
print(
f"Error with mitigation (LRE): {abs(ideal - mitigated):.{3}}"
)
```

As you can see we again see a nice improvement in the accuracy using a two stage application of LRE.
35 changes: 35 additions & 0 deletions docs/source/guide/lre-2-use-case.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.10.3
kernelspec:
display_name: Python 3 (ipykernel)
language: python
name: python3
---

# When should I use LRE?

## Advantages

Just as in ZNE, LRE can also be applied without a detailed knowledge of the underlying noise model as the effectiveness of the technique depends on the choice of scale factors.
Thus, LRE is useful in scenarios where tomography is impractical.

The sampling overhead is flexible wherein the cost can be reduced by using larger values for the fold multiplier (used to
create the noise-scaled circuits) or by chunking a larger circuit to fold groups of layers of circuits instead of each one individually.

## Disadvantages

When using a large circuit, the number of noise scaled circuits grows polynomially such that the execution time rises because we require the sample matrix to be a square matrix (more details in the [theory](lre-5-theory.md) section).

When reducing the sampling cost by using a larger fold multiplier, the bias for polynomial extrapolation increases as one moves farther away from the zero-noise limit.

Chunking a large circuit with a lower number of chunks to reduce the sampling cost can reduce the performance of LRE.
In ZNE parlance, this is equivalent to local folding faring better than global folding in LRE when we use a higher number of chunks in LRE.

```{attention}
We are currently investigating the issue related to the performance of chunking large circuits.
```
95 changes: 95 additions & 0 deletions docs/source/guide/lre-5-theory.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
jupytext:
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.11.4
kernelspec:
display_name: Python 3
language: python
name: python3
---

# What is the theory behind LRE?

Similar to [ZNE](zne.md), LRE works in two steps:

- **Step 1:** Intentionally create multiple noise-scaled but logically equivalent circuits by scaling each layer or chunk of the input circuit through unitary folding.

- **Step 2:** Extrapolate to the noiseless limit using multivariate richardson extrapolation.

The noise-scaled circuits in ZNE are scaled by the user choosing which layers of the input circuit to fold whereas in LRE
each noise-scaled circuit scales the layers in the input circuit in a specific pattern.
LRE leverages the flexible configuration space of layerwise unitary folding, allowing for a more nuanced mitigation of errors by treating the noise level of each layer of the quantum circuit as an independent variable.

## Step 1: Create noise-scaled circuits

The goal is to create noise-scaled circuits of different depths where the layers in each circuit are scaled in a specific pattern as a result of [unitary folding](zne-5-theory.md).
This pattern is described by the vector of scale factor vectors which are generated after the fold multiplier and degree for multivariate Richardson extrapolation are chosen.

Suppose we're interested in the value of some observable of a circuit $C$ that has $l$ layers.
For each layer $0 \leq L \leq l$ we can choose a scale factor for how much to scale that particular layer.
Thus a vector $\lambda \in \mathbb{R}^l_+$ corresponds to a folding configuration where $\lambda_0$ corresponds to the scale factor for the first layer, and $\lambda_{l - 1}$ is the scale factor to apply on the circuits final layer.

Fix the number of noise-scaled circuits we wish to generate at $M\in\mathbb{N}$.
Define $\Lambda = (λ_1, λ_2, \ldots, λ_M)^T$ to be the collection of scale factors and let $(C_{λ_1}, C_{λ_2}, \ldots, C_{λ_M})^T$ denote the noise-scaled circuits corresponding to each scale factor.

After $d$ is fixed as the degree of the multivariate polynomial, we define $M_j(λ_i, d)$ to be the terms in the polynomial arranged in increasing order.
In general, the number of monomial terms with $l$ variables up to degree $d$ can be determined
through the [stars and bars method](https://en.wikipedia.org/wiki/Stars_and_bars_%28combinatorics%29).

For example, if $C$ has 2 layers, the degree of the extrapolating polynomial is 2, the basis of monomials contains 6 terms: $\{1, λ_1, λ_2, {λ_1}^2, λ_1 \cdot λ_2, {λ_2}^2 \}$.

$$
\text{total number of terms in the monomial basis with max degree } d = \binom{d + l}{d}
$$

As the choice for the degree of the extrapolating polynomial is 2, we search for the number of terms with total degree 2 using the following formula:

$$
\text{number of terms in the monomial basis with total degree } d = \binom{d + l - 1}{d}
$$

Terms with total degree 2 are 3 calculated by $\binom{2 + 2 -1}{2} = 3$ and correspond to $\{{λ_1}^2, λ_1 \cdot λ_2, {λ_2}^2 \}$.

Similarly, number of terms with total degree 1 and 0 can be calculated as $\binom{1 + 2 -1}{1} = 2:\{λ_1, λ_2\}$ and $\binom{0 + 2 -1}{0}= 1: \{1\}$ respectively.

These terms in the monomial basis define the rows of the square sample matrix as shown below:

$$
\mathbf{A}(\Lambda, d) =
\begin{bmatrix}
M_1(λ_1, d) & M_2(λ_1, d) & \cdots & M_N(λ_1, d) \\
M_1(λ_2, d) & M_2(λ_2, d) & \cdots & M_N(λ_2, d) \\
\vdots & \vdots & \ddots & \vdots \\
M_1(λ_N, d) & M_2(λ_N, d) & \cdots & M_N(λ_N, d)
\end{bmatrix}
$$

For our example circuit of $l=2$ and $d=2$, each row defined by the generic monomial terms $\{M_1(λ_i, d), M_2(λ_i, d), \ldots, M_N(λ_i, d)\}$ in the sample matrix $\mathbf{A}$ will instead be replaced by $\{1, λ_1, λ_2, {λ_1}^2, λ_1 \cdot λ_2, {λ_2}^2 \}$.

Here, each monomial term in the sample matrix $\mathbf{A}$ is then evaluated using the values in the scale factor vectors. In Step 2, this sample matrix will be utilized to obtain our mitigated expectation value.

## Step 2: Extrapolate to the noiseless limit

Each noise scaled circuit $C_{λ_i}$ has an expectation value $\langle O(λ_i) \rangle$ associated with it such that we can define a vector of the noisy expectation values $z = (\langle O(λ_1) \rangle, \langle O(λ_2) \rangle, \ldots, \langle O(λ_M)\rangle)^T$.
These values can then be combined via a linear combination to estimate the ideal value $variable$.

$$
O_{\mathrm{LRE}} = \sum_{i=1}^{M} \eta_i \langle O(λ_i) \rangle.
$$

Finding the coefficients in the linear combination becomes a problem solvable through a system of linear equations $\mathbf{A} c = z$ where $c$ is the coefficients vector $(\eta_1, \eta_2, \ldots, \eta_N)^T$, $z$ is the vector of the noisy expectation values and $\mathbf{A}$ is the sample matrix evaluated using the values in the scale factor vectors.

The [general multivariate Lagrange interpolation polynomial](https://www.siam.org/media/wkvnvame/a_simple_expression_for_multivariate.pdf) is defined by a new matrix $\mathbf{B}_i$ obtained by replacing the $i$-th row of the sample matrix $\mathbf{A}$ with monomial terms evaluated using the generic variable λ. Thus, matrix $\mathbf{B}_i$ represents an interpolating polynomial in variable λ of degree $d$. As we only need to find the noiseless expectation value, we can skip calculating the full vector of linear combination coefficients if we use the [Lagrange interpolation formula](https://files.eric.ed.gov/fulltext/EJ1231189.pdf) evaluated at $λ = 0$ i.e. the zero-noise limit.

To get the matrix $\mathbf{B}_i(\mathbf{0})$, replace the $i$-th row of the sample matrix $\mathbf{A}$ by $\mathbf{e}_i=(1, 0, \ldots, 0)$ where except $M_1(0, d) = 1$ all the other monomial terms are zero when $λ=0$.

$$
O_{\rm LRE} = \sum_{i=1}^M \langle O (\boldsymbol{\lambda}_i)\rangle \frac{\det \left(\mathbf{B}_i (\boldsymbol{0}) \right)}{\det \left(\mathbf{A}\right)}
$$

To summarize, based on a user's choice of degree of extrapolating polynomial for some circuit, expectation values from noise scaled circuits created in a specific pattern along with multivariate Lagrange interpolation of the sample matrix evaluated using the scale factor vectors are used to find error mitigated expectation value.

Additional details on the LRE functionality are available in the [API-doc](https://mitiq.readthedocs.io/en/stable/apidoc.html#module-mitiq.lre.multivariate_scaling.layerwise_folding).
28 changes: 28 additions & 0 deletions docs/source/guide/lre.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
```{warning}
The user guide for LRE in Mitiq is currently under construction.
```

# Layerwise Richardson Extrapolation

Layerwise Richardson Extrapolation (LRE), an error mitigation technique, introduced in
{cite}`Russo_2024_LRE` extends the ideas found in ZNE by allowing users to create multiple noise-scaled variations of the input
circuit such that the noiseless expectation value is extrapolated from the execution of each
noisy circuit.

Layerwise Richardson Extrapolation (LRE), an error mitigation technique, introduced in
{cite}`Russo_2024_LRE` works by creating multiple noise-scaled variations of the input
circuit such that the noiseless expectation value is extrapolated from the execution of each
noisy circuit (see the section [What is the theory behind LRE?](lre-5-theory.md)). Compared to
Zero-Noise Extrapolation, this technique treats the noise in each layer of the circuit
as an independent variable to be scaled and then extrapolated independently.

You can get started with LRE in Mitiq with the following sections of the user guide:

```{toctree}
---
maxdepth: 1
---
lre-1-intro.md
lre-2-use-case.md
lre-5-theory.md
```

0 comments on commit 7e4fb92

Please sign in to comment.