Improve the slow complication of MPSCircuit #213

royess · 2024-05-24T09:17:06Z

Issue Description

In my tests, MPSCircuits take much longer to get jitted than the same circuits. For example, within 2 minutes for a default-type circuit of 8 qubits, but at least more than 40 minutes for the same circuit in MPS. (I didn't wait for the complication to finish.)

Considering that we typically want to use an MPS simulator for a relatively large circuit size, this makes using MPS circuits particularly difficult.

Proposed Solution

Additional References

If it helps, I can provide a script with a slow MPS complication. This seems to be a general problem for slightly large circuit sizes, though.

royess · 2024-05-24T09:36:59Z

@refraction-ray I would appreciate your comments or suggestions.

refraction-ray · 2024-05-24T12:36:12Z

Indeed, jitting a large MPS circuit would require longer times than a plain Circuit, the jitting time varies significantly for different backend (jax vs. tf) or different hardware (cpu vs. gpu). What is the depth of you test circuit, typically MPSCircuit of 8 qubits wouldn't require that longer jitting time. Besides, you may also try the unjit version to see whether the running time is acceptable?

royess · 2024-05-24T13:17:25Z

Thanks for the quick reply! Currently, I am using jax+cpu.

What is the depth of you test circuit

In total, 188 one- or two-qubit gates. The depth should be around 24, then.

By the way, I also tested a circuit of 320 gates via the snippet you provided in #204 (comment) (but changing the circuit to contain 8 qubits and increase the depth) in the same environment. That takes a jit time of about 120s.

Differences:

My circuit (the slow one) does not obey the 1D locality, i.e., gates are very non-local.
There are function calls to append components to build my circuit.

Besides, you may also try the unjit version to see whether the running time is acceptable?

I tried. But the training seems much slower than plain circuit simulator. It's not quite acceptable for my needs.

Muzhou-Ma · 2024-05-24T15:02:08Z

Hi, I'm also testing JIT compilation with MPSCircuit with @royess. I use a circuit of the same depth but with geometric locality, and the compile time is faster (since we are not able to finish jit compiling for the non-local circuit, I can't say how much faster). However, I don't understand why locality will cause such a difference. What is the logic of in Tensorcircuit when doing contraction?

Muzhou-Ma · 2024-05-24T16:27:40Z

However, the compiling time is still prolonged when scaling up the qubit number. It takes about 15 minutes for four qubits and more than 2 hours for 16 qubits (the compilation process is still unfinished). This runs counter to the purpose of using MPSCircuit, which is to optimize the computation resource (both time and memory) when scaling up the qubit number. Is there any way to make this better?

refraction-ray · 2024-05-25T04:27:14Z

The locality is very important for MPSCircuit. If a non-local two-qubit gate is applied, it will be firstly transformed into a series of local swap gates + local two-qubit gate, and all these gates will be applied to the MPS sequentially. The reason is that only local two-qubit tensor can be safely applied and truncated to merge into the MPS for TEBD like algorithms.

refraction-ray · 2024-05-25T04:34:08Z

for unjit version, if AD is not required in your workflow, maybe numpy backend is the fastest

refraction-ray · 2024-05-25T04:59:24Z

Another possible workaround is when your circuit has some time periodicity, then a scan wrapper can greatly reduce the jitting time, see an example for Circuit: https://github.com/tencent-quantum-lab/tensorcircuit/blob/master/examples/hea_scan_jit_acc.py. I believe the example can be transfered to MPSCircuit, with in and out the stacked MPS tensors.

royess · 2024-05-25T05:01:43Z

Thanks for your advice! But I think we need AD and do not have time periodicity in our circuit.

royess · 2024-05-25T05:06:24Z

Do you think it is doable to speed up the complication for MPSCircuit? We will be happy to help if you have ideas on how to work on that. (We need this feature in our research. And it seems to have no other workaround.)

Naively, I will suppose an MPSCircuit is not much more complicated than a normal one in its structure. And should it come with a reasonable jit time?

refraction-ray · 2024-05-25T05:17:26Z

Do you think it is doable to speed up the complication for MPSCircuit?

Firstly, from physics perspective, a TEBD like algorithm applied on non-periodic and very structured circuit often leads to a very large approximation error unless there are some types of theory guarantee, eg. one can show that the intermediate state in the circuit is always area law entangled.

From engineering perspective, accelerating jit time is much harder than accelerating running time, as the former is nearly fixed by the ML framework that we have very less control.

One possible way is to support MPS with grouping qubits as one tensor instead of one qubit for one tensor. In the former case, much fewer QR or SVD is required and the approximation error is more controllable. eg. d qubits as one tensor leg, the mps tensor has dimension ($\chi$, 2^d, $\chi$). Then only two-qubit gates across different qubit groups requires truncation, gates within one group is directly merged to the MPS tensor by matrix multuplication.

refraction-ray · 2024-05-25T05:20:42Z

And what is the target circuit metric (qubits number, error, circuit depth, gate number etc.) in your case? Also, have you tried tf backend? the jitting time is much shorter

Muzhou-Ma · 2024-05-29T12:19:18Z

And what is the target circuit metric (qubits number, error, circuit depth, gate number etc.) in your case?

The target circuit contains about 80 qubits, and about 60 layers of all-to-all connected non-local 2-qubit gates, making it roughly 3000 non-local 2-qubit gates. Do you think it is possible to jit such circuits (with or without MPS)?

Also, have you tried tf backend? the jitting time is much shorter

Yes, we have tried tf backend. Unfortunately, tf backend does not have hittable version of QR decomposition, so jitting MPS is impossible.

refraction-ray · 2024-05-30T08:47:36Z

The target circuit contains about 80 qubits, and about 60 layers of all-to-all connected non-local 2-qubit gates, making it roughly 3000 non-local 2-qubit gates.

To me, the scale for the simulation is very challenging, by translating to local 2-qubit gates, I guess roughly 30000 two-qubit gates are required for 80 qubits system. Even if MPSCircuit can simulate this, the accuracy would be bad in general. Actually, the simulation scale is even beyond the quantum supremacy experiments, I don't see this as an easy task to run by calling API with one GPU.

Unfortunately, tf backend does not have hittable version of QR decomposition

What do you mean by this, can you run the circuit with jitted tf backend? I don't think vmap is relevant for your use case since one circuit is challenging enough to simulate after all? There is no need to "stack" multiple circuits together to simulate

Muzhou-Ma · 2024-05-30T09:34:54Z

To me, the scale for the simulation is very challenging, by translating to local 2-qubit gates, I guess roughly 30000 two-qubit gates are required for 80 qubits system. Even if MPSCircuit can simulate this, the accuracy would be bad in general. Actually, the simulation scale is even beyond the quantum supremacy experiments, I don't see this as an easy task to run by calling API with one GPU.

I see, thanks a lot.

There is no need to "stack" multiple circuits together to simulate

Yes, I understand this. However, for our task, we need to run many batches. As you have said, if simulating one circuit is challenging enough, then there's no reason to consider stacking them together.

Muzhou-Ma · 2024-05-30T09:38:51Z

@refraction-ray Thank you for the discussion. It seems that it is no longer a technical problem. The task we face has a fundamental hardness, which is unsolvable with current classical simulation techniques and reasonable computation resources.

Muzhou-Ma mentioned this issue May 24, 2024

Vmap isn't working when using JAX-based MPSCircuits #214

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve the slow complication of MPSCircuit #213

Improve the slow complication of MPSCircuit #213

royess commented May 24, 2024 •

edited

Loading

royess commented May 24, 2024

refraction-ray commented May 24, 2024

royess commented May 24, 2024 •

edited

Loading

Muzhou-Ma commented May 24, 2024

Muzhou-Ma commented May 24, 2024

refraction-ray commented May 25, 2024

refraction-ray commented May 25, 2024

refraction-ray commented May 25, 2024 •

edited

Loading

royess commented May 25, 2024

royess commented May 25, 2024 •

edited

Loading

refraction-ray commented May 25, 2024

refraction-ray commented May 25, 2024 •

edited

Loading

Muzhou-Ma commented May 29, 2024 •

edited

Loading

refraction-ray commented May 30, 2024 •

edited

Loading

Muzhou-Ma commented May 30, 2024 •

edited

Loading

Muzhou-Ma commented May 30, 2024

Improve the slow complication of MPSCircuit #213

Improve the slow complication of MPSCircuit #213

Comments

royess commented May 24, 2024 • edited Loading

Issue Description

Proposed Solution

Additional References

royess commented May 24, 2024

refraction-ray commented May 24, 2024

royess commented May 24, 2024 • edited Loading

Muzhou-Ma commented May 24, 2024

Muzhou-Ma commented May 24, 2024

refraction-ray commented May 25, 2024

refraction-ray commented May 25, 2024

refraction-ray commented May 25, 2024 • edited Loading

royess commented May 25, 2024

royess commented May 25, 2024 • edited Loading

refraction-ray commented May 25, 2024

refraction-ray commented May 25, 2024 • edited Loading

Muzhou-Ma commented May 29, 2024 • edited Loading

refraction-ray commented May 30, 2024 • edited Loading

Muzhou-Ma commented May 30, 2024 • edited Loading

Muzhou-Ma commented May 30, 2024

royess commented May 24, 2024 •

edited

Loading

royess commented May 24, 2024 •

edited

Loading

refraction-ray commented May 25, 2024 •

edited

Loading

royess commented May 25, 2024 •

edited

Loading

refraction-ray commented May 25, 2024 •

edited

Loading

Muzhou-Ma commented May 29, 2024 •

edited

Loading

refraction-ray commented May 30, 2024 •

edited

Loading

Muzhou-Ma commented May 30, 2024 •

edited

Loading