"Shift Rule for Gradient Determination in Parameterised Quantum Evolutions" — Whitepaper

Table of Contents

1. Introduction

Most of the currently dominant proposals for useful near-term (i.e., pre-fault-tolerant) quantum computing make use of quantum evolutions that depend on parameters, which need to be trained (i.e., optimized to minimize a loss or error function) in order for the quantum evolution to give the desired result. This approach is referred to as variational quantum computing.

For example, in the context of IBM's recent (12/2023) update of their quantum computing roadmap, IBM showcased a number of examples for what they call "quantum utility": Situations in which the use of a quantum computer resulted in a if not commercial but at least scientific benefit. Their examples either directly prepare quantum states that are of scientific interest, or they use variational approaches.

The invention (Theis 2022a) that is discussed in this text pertains to making variational approaches on analog quantum computers (aka "quantum simulators") more efficient, by speeding up the computations of gradients, which are used to train the parameters.

We first discuss where and how the "Nyquist Shift Rule" method in the invention would be used and what the state-of-the-art methods are; then we give results comparing the Nyquist shift rule to the state-of-the-art in numerical simulations.

2. Background

With a view towards the applicability of the invention, we firstly by discussing the types of commercial use-cases of quantum computing in which gradients are important. Secondly, we go through the state-of-the-art methods. Finally, we discuss the analog quantum computers, as the hardware domain of the invention.

2.1. Gradients of parameterized quantum evolutions — why?

2.1.1. Quantum Machine Learning

In the classical world, training a neural network amounts to optimizing the parameters it comprises in such a way that a loss function is minimized. In quantum machine learning, parameterized quantum processes take the role of the classical neural network — indeed, these parameterized quantum processes are themselves often called quantum neural networks.

The training of the parameters is not different between the quantum and classical worlds in two ways: (1) There is (currently?) no backpropagation algorithm for quantum neural networks, leading to a linear dependence of the efficiency of the training on the number of parameters (Abbas et al. 2023). (2) Quantum measurements yield an additional source of randomness in the gradient estimates next to the pseudo-randomness introduced by iterating over the training data.

Notably, in quantum machine learning, already for feeding input data into the parameterized quantum process does it make sense to use parameters (Gil Vidal and Theis 2020).

There is a wealth of parameterized quantum processes models that are showing promise for quantum machine learning. Some of them are tailored to analog quantum computers — making our invention applicable.

2.1.2. Combinatorial Optimization

The most promising proposals to solving combinatorial optimization problems on noisy quantum computers encode the cost-function to be minimized in the form of a cost Hamiltonian: The low-energy eigenstates of the Hamiltonian correspond to low-cost solutions of the combinatorial optimization problem. After preparing the system in an initial state, an iterative version of adiabatic evolution is employed: In every step, for a certain time, the system is subject to evolution that is a weighted sum of the cost Hamiltonian and the initial-state Hamiltonian. Typically, the weights in the sum are considered as parameters that are trained. For that training, gradients are typically used.

Combinatorial optimization problems which require constraints (i.e., almost all) are more difficult to map onto a Hamiltonian. Luckily though, in analog quantum computing, sometimes QAOA-approaches can be combined with "natural" constraints that are included in the system Hamiltonian; see the discussion about the 2.3.1 below — this is where our method can speed up gradient computations.

2.1.3. Probably not applicable: Quantum chemistry

The third large type of use-cases for parameterized quantum evolutions is quantum-computer chemistry, based, for example, on Variational Quantum Eigensolver techniques. While our method is in principle applicable there, the difficulties that arise from the large number of terms in the Hamiltonians dominate, so that gradient speed-ups have too small an effect.

2.2. Gradients of parameterized quantum evolutions — how?

The most efficient way to train the parameters in parameterized quantum evolutions is Stochastic Gradient Descent (SGD) and variants. There, a current vector of parameters is maintained. In every iteration, a random estimate of the gradient is produced, and the parameter vector is updated by subtracting a multiple of the gradient estimate.

The performance (number of iterations to achieve a near-optimal parameter setting) of SGD-variants depends critically on two properties of the gradient estimator: Its variance and its bias.

In the applications to near-term quantum computing outlined above, the loss function and its gradient are estimated by repeatedly running the quantum evolution and then measuring. The term shot is used to for a single run of the quantum evolution with final measurement. When considering the efficiency of noisy quantum computations, the number of shots is the most important quantity, as each individual shot takes only a negligible amount of time.

The fundamental randomness of quantum mechanical measurements leads to a variance in the gradient estimation. The magnitude of that variance depends on what method is chosen for the gradient estimation. The same holds for the bias: Different methods for estimating gradients have differ with regards to whether a bias is present and how large it is.

How are gradients of parameterized quantum evolutions obtained using state of the art methods?

2.2.1. Symmetric Difference Quotient (SDQ)

This simple method known from basic numerical analysis has a bias of ≈𝜖², where 𝜖 > 0 is limited only by the precision with which the function can be evaluated.

In the quantum setting, it works as follows: For each of the parameters in turn, SDQ runs the quantum evolution twice, with parameter settings 𝜽ⱼ → 𝜽ⱼ ±𝜖, where 𝑗 is the index of the current parameter. The estimate for the derivative 𝚍/𝚍𝜽ⱼ is the arithmetic mean of two measurements, scaled by a factor of 1/𝜖. In the case of usual Pauli-measurements, the variance is ≈ 1/𝜖².

SDQ method for estimating derivatives has the property that the bias can be arbitrarily reduced — but any reduction in bias is paid for by an increase in variance of equal magnitude.

2.2.2. Shift rules

Shift rules methods work similarly to SDQ: For each parameter in turn, the derivative 𝚍/𝚍𝜽ⱼ is estimated by updating only 𝜽ⱼ.

In contrast to SDQ, shift rules (Wierichs et al. 2021) generally can lead to unbiased estimators while at the same achieving the minimum possible variance that any unbiased estimator for the derivative must have (Theis 2021).

2.2.3. Banchi-Crooks' method

Banchi and Crooks have proposed a method that they call a "stochastic approximate shift rule" (Banchi and Crooks 2021). The word "approximate" refers to the presence of a bias.

The advantage of their method is the following: While the standard parameter shift rules of the previous section work only in cases of standard quantum evolutions of the form 𝑒ⁱᶿᴬ — which are typically found in digital quantum computers — Banchi & Crooks' method can work with quantum evolutions that are typically found in analog quantum computers (see 2.3 below).

However, breaking with conventional terminology, their method requires modifications of the quantum evolution beyond just modifying parameter values. This results in the quantum evolutions taking longer, which means that there is more time for quantum error to affect the outcome. For the digital quantum computers of this decade, this is a notable drawback.

2.2.4. Simultaneous Perturbation Stochastic Approximation (SPSA)

Both the vanilla SDQ and shift rules methods have the disadvantage that they need to be applied to each parameter separately. This means that the variance of the whole gradient vector will be proportional to the number of parameters. Hence, a good estimate of the gradient requires a number of shots that scales at least linearly with the number of parameters.

SPSA is being discussed as a method that can give an estimate of the whole gradient vector with only a few shots (e.g., two shots).

However, it was recently pointed out (Appendix A.2.3 in Abbas et al. 2023) that the SPSA gradient estimator has a variance that is proportional to the number of parameters — so SPSA offers no advantage in reducing the number of shots in Stochastic Gradient Descent.

2.2.5. Conclusion

Due to low variance and absence of bias, shift rules are the gold standard for derivative computations. But with the exception of Banchi-Crooks' method, shift rules don't work for some of the typical parameterized quantum evolutions on analog quantum computers.

The method described in the patent application is a parameter shift rule that targets the same type of quantum evolutions as Banchi-Crooks' method, but it does away with the disadvantage of having to modify the quantum evolution beyond changing parameter values.

Moreover, numerical simulations (Theis 2023, source code is available) have shown that the method invented at the University of Tartu has smaller bias than Banchi-Crooks' method — while quantum hardware constraints and variance stay the same. (Indeed, the method is in principle unbiased, biases only result from limitations in the quantum hardware.)

2.3. Analog quantum computers (aka quantum simulators)

Digital quantum computing is based on quantum circuits and quantum gates. On the other hand, analog quantum computing (also often referred to as quantum simulation) attempts to perform quantum evolutions not as a sequence of quantum gates, but through continuous manipulation of the quantum Hamiltonian.

The idea is that while exact gates are a prerequisite for fault-tolerant quantum computing, entangling gates to date still introduce significant random noise into the computation, causing decoherence. In comparison, some overall quantum processes can be realized through analog quantum evolutions without giving up too much coherence.

In the past few years, in face of the rising disappointment over the failure to exploit quantum circuit based noisy quantum computing for industrially relevant use-cases, analog quantum computing has risen to carry the hopes for pre-fault-tolerant commercial quantum advantage.

In contrast to the previously known shift rules for quantum circuits, my method (Theis 2022a) works well with typical situation in an analog quantum computer: The part of the Hamiltonian that expresses a parameter under consideration is active simultaneously with other, non-commuting parts.

2.3.1. Rydberg atom arrays and the Rydberg Blockade

The "Stable Set Problem" (also known as, "Maximum Independent Set Problem") is a classical — and famously hard to approximate — combinatorial optimization problem. Formulated on an intuitive level, when given as input a set of objects any two of which might be incompatible with eat other, the Stable Set Problem asks for finding the largest (or most highly weighted) sub-collection of pairwise compatible objects. A great variety of combinatorial optimization problems can be derived as Stable Set with additional constraints, or reduced to Stable Set in a simple way.

Atom arrays, held in place (or even bussed around) with optical tweezers, offer a natural analog-quantum formulation for Stable Set by exploiting the Rydberg blockade: If two atoms are in close proximity, there is a heavy energy penalty for both of to be are in the excited "Rydberg" state. It is intuitive that the incompatibility relation can be modeled by identifying atoms with objects and bringing atoms representing incompatible objects into close proximity before sending the pulses which effect the transition into the excited state.

Next to Stable Set, atom arrays have also successfully been used for solving so-called Max-Cut (or Ising) problems.

Various publications over the last year have demonstrated the power of (reconfigurable) Rydberg atom arrays. While the most head-turning one of them is based on quantum error correction, until a major problem — the continuous refill or reuse of atoms — is solved, error corrected quantum computing on reconfigurable atom arrays remains out of reach. Hence, it appears that hopes for commercial quantum advantage based on Rydberg atoms within this decade still rest on analog simulators.

2.3.2. Other analog simulators

Couplers between qubits in superconducting circuits can be used in an analog way, which is exploited, e.g., in digital-analog quantum computing/simulation. Derivative of the parameters within single qubits pulses can then be taken using our method. In superconducting qubit literature, analog quantum computing is sometimes referred to under the term sub-logical control (e.g., Babbush and Neven 2016).

There have been publications using Trapped Ion quantum computers in an analog way. However, I am insufficiently familiar with that qubit platform to judge whether our method is applicable there, or even whether analog quantum computing is still pursued on trapped ions at all.

3. Numerical simulations

Basic facts in signal processing suggest that the Nyquist shift rule might perform badly if the following two factors occur together:

  1. The quantum evolution whose derivative is to be taken contains strong contributions of low (non-zero) frequencies;
  2. The width of the range in which the parameter values can be taken is restricted through constraints imposed by the quantum device.

More accurately, the Nyquist shift rule can be expected to perform well if the width of the parameter range is inverse proportional to the lowest relevant frequency.

As in today's quantum devices, the parameter range is typically restricted, we have conducted experiments how this — and other — constraints affect the relative performance of the method.

3.1. Comparison with Banchi-Crooks

To compare Nyquist shift rules with the method of Banchi & Crooks (Banchi and Crooks 2021), we conducted numercial simulations in a generic setting.

The accuracy (i.e., bias) of Banchi-Crooks' method can be tuned: A higher accuracy requires a larger width of the parameter range that needs to be accessed. In our numerical experiments indicate that, we set up the Nyquist shift rule to operate in the same parameter range width as Banchi-Crooks' method. The results indicate that, for all widths, the Nyquist shift rule has an order of magnitude better accuracy (Theis 2023).

The source code for said experiments is available for inspection and further tests (Theis 2022b).

3.2. Complementarity with Symmetric Difference Quotients

Symmetric Difference Quotients have complementary behavior to the Nyquist shift rule: While the bias of the Nyquist shift deteriorates with the influence of low frequencies, the bias and variance of SDQ are proportional to the largest frequencies that occur. The situation mirrors that of standard parameter shift rules, which outperform SDQ unless the largest frequency is large (in which case all methods require either large variance or large bias).

In work that is currently being written down, we have compared SDQ and Nyquist shift rule in realistic simulations of Rydberg atom array analog quantum computing devices with typical current-day constraints (Theis 2024). Not only the parameter ranges, but also other relevant constraints were taken into account in the numerical simulations. The results of the simulations indicated that a comparison of the parameter range with an easily available bound on the frequency of the quantum evolution helps to recognize the quantum evolutions where the Nyquist shift rule yields a considerable improvement.

The source code for said experiments will be made available for inspection and further tests.

3.2.1. Hardware-software co-design

The work just described revealed an opportunity to improve the system performance of the analog quantum computer by co-designing with the gradient-estimation method: An increase in the width of the parameter ranges leads to improved performance of the Nyquist gradient estimator. Gradient estimation is a critical factor influencing the efficiency of training Quantum Neural Networks or optimizing Quantum Approximate Optimization Algorithm models.

4. Bibliography

Abbas, Amira, Robbie King, Hsin-Yuan Huang, William J. Huggins, Ramis Movassagh, Dar Gilboa, and Jarrod R. McClean. 2023. “On Quantum Backpropagation, Information Reuse, and Cheating Measurement Collapse.” Preprint Arxiv:2305.13362. https://doi.org/10.48550/arXiv.2305.13362.
Babbush, Ryan, and Hartmut Neven. 2016. “Training Quantum Evolutions Using Sublogical Controls.” US 10,275,717 B2; US 11,055,626 B2; US 11,562,285 B2.
Banchi, Leonardo, and Gavin E. Crooks. 2021. “Measuring Analytic Gradients of General Quantum Evolution with the Stochastic Parameter Shift Rule.” Quantum 5 (January): 386. https://doi.org/10.22331/q-2021-01-25-386.
Gil Vidal, Javier, and Dirk Oliver Theis. 2020. “Input Redundancy for Parameterized Quantum Circuits.” Frontiers in Physics 8: 297. https://doi.org/10.48550/arXiv.1901.11434.
Theis, Dirk Oliver. 2021. “Optimality of Finite-Support Parameter Shift Rules for Derivatives of Variational Quantum Circuits.” Preprint. https://doi.org/10.48550/arXiv.2112.14669.
———. 2022a. “Shift Rule for Gradient Determination in Parameterised Quantum Evolutions.” US17/856,357; PCT/EP2023/068112.
———. 2022b. “Supplementary Material for ’``Proper’’ Shift Rules…’.” Julia Notebooks. https://dojt.github.io/storage/nyquist/.
———. 2023. “``Proper’’ Shift Rules for Derivatives of Perturbed-Parametric Quantum Evolutions.” Quantum 7: 1052. https://doi.org/10.22331/q-2023-07-11-1052.
———. 2024. “Derivatives of Parameterized Bang-Bang Pulses in Rydberg-Array Analog Quantum Computers.”
Wierichs, David, Josh Izaac, Cody Wang, and Cedric Yen-Yu Lin. 2021. “General Parameter-Shift Rules for Quantum Gradients.” Preprint. https://doi.org/10.48550/arXiv.2107.12390.

Date: Wed Feb 7 12:34:09 CET 2024

Author: Dirk Oliver Theis, University of Tartu, Estonia

Created: 2024-03-07 Thu 18:26

Validate