Path: blob/main/chemistry-simulations/krylov_subspace_diagonalization.ipynb
1133 views
Krylov Quantum Subspace Diagonalization
The Krylov quantum subspace diagonalization method is a powerful hybrid quantum chemistry method for estimating ground state energies, taking advantage of quantum computers to build a matrix and using a classical supercomputer to diagonalize that matrix. The method has advantages over variational approaches and is a promising technique for early fault tolerant quantum computing. This lab will introduce the Krylov method, guide you through implementing it in CUDA-Q, teach you how to improve results, and parallelize computations across multiple QPUs.
What you will do:
Learn the theory behind Krylov quantum subspace diagonalization
Implement a Krylov solver in CUDA-Q
Explore methods to improve your solver such as higher order Trotter approximations and better reference states
Parallelize your implementation to run on multiple simulated QPUs
CUDA-Q Syntax you will use:
methods for CUDA-Q
kernelconstructionobserveandget_stateto evaluate kernelsThe
mqpubackend, the associatedobserve_asyncandget_state_asyncfunctions, andget.exp_paulito dynamically construct quantum circuits from CUDA-Q Spin Operators
Prerequisites: This lab assumes you have a basic understanding of quantum algorithms and understand the basic concepts of quantum computing. If you need a refresher, check out the "Quick Start to Quantum Computing with CUDA-Q" series to learn how to code a VQE implementation with CUDA-Q from scratch. This lab focuses on a chemistry application, so some knowledge of chemistry terminology is necessary to fully appreciate the exercises.
Run the following code to prepare the packages you will use in this lesson.
💻 Just a heads-up: This notebook is designed to be run on an environment with a GPU. If you don't have access to a GPU, feel free to read through the cells and explore the content without executing them. Enjoy learning! ⭐
Quantum Subspace Diagonalization
Quantum computing methods for quantum chemistry lie on a spectrum from NISQ methods to methods suited for large-scale fault tolerant quantum computers. The problem that we'll be considering in this notebook is to find the ground state energy of a Hamiltonian. Mathematically, this is equivalent to finding the lowest eigenvalue of a given matrix.
The prototypical NISQ algorithm is the Variational Quantum Eigensolver (VQE) which computes ground state energies by classically optimizing a parameterized circuit and computing expectation values for the cost function. These circuits are shallow which is a favorable trait, but the number of measurements needed each round scales O() with the size of the Hamiltonian. Most problematic is the fact that circuits corresponding to even modest qubit counts have extremely flat optimization surfaces and tend to not converge which is the so called "barren plateau" problem. Even when convergence is possible, the number of iterations required may become intractably large.
On the other side of the spectrum, quantum phase estimation (QPE) is hallmark fault tolerant algorithm for quantum chemistry. Given a good initial state, QPE can compute ground state energies with exponential precision, but requires extremely deep circuits due to the necessity of the quantum Fourier transform in most implementations. QPE is extremely sensitive to errors, requiring very low error rates to function. Consequently, it relies on quantum error correction protocols (Learn more in our QEC 101 course), which are resource-intensive and demand thousands to millions of physical qubits.
Quantum Krylov methods are a bridge between the two and balance tradeoffs to potentially result in useful applications on devices that are more near term. The main idea is to reduce the size of the matrix to a smaller one and compute the eigenvalues of this smaller matrix. If chosen well, the lowest eigenvalues of the smaller matrix and the larger one agree. The Krylov approach is a subset of broader quantum subspace diagonalization (QSD) methods which try to diagonalize the Hamiltonian in a general non-orthogonal basis. In other words, given an initial state that is close to the ground state, the Krylov method will help identify a good subspace of the Hamiltonian to diagonalize such that the best possible estimates of the extreme eigenvalues (i.e., the ground state) are produced.
Such an approach requires circuits of moderate depth and uses them to populate matrix elements of a Hamiltonian subspace which is then classically diagonalized. Krylov methods still suffer from the O() Hamiltonian scaling as the full Hamiltonian must be measured to compute subspace matrix elements. However, increasing the subspace dimension exponentially leads to convergence towards the ground state energy, which could outperform the many VQE iterations required for larger problems.
The widget below will provide some intuition for subspace diagonalization and why Krylov methods work. Given the 8x8 matrix below, move the slider to select a subspace dimension and look at the computed eigenvalues. Notice the ground state energy converges to the exact ground state while the other eigenvalues remain rather inaccurate. Subspace methods are quite good at estimating the extreme eigenvalues (highest or lowest) which is favorable as the lowest eigenvalue (the ground state energy) is usually the quantity of interest.
Click this link to access the widget
To get a sense for why a good initial state is important (more on that later), choose which corner the subspace originates from in the widget. Notice, the approximate eigenvalues are much worse when the subspace is constructed from certain corners which happen to capture less information about the eigen spectrum of the matrix.
Similarly, in the quantum Krylov methods, a good approximation of the ground state will help construct a better subspace for estimating the ground state energy.
The Quantum Krylov Method
This section will walk through an implementation of a quantum Krylov method based on A Multireference Quantum Krylov Algorithm for Strongly Correlated Electrons. The general workflow follows the figure below inspired by Quantum Krylov subspace algorithms for ground and excited state energy estimation. The first step is to build a subspace for which a generalized eigenvalue problem will be solved. Then, the matrices for the eigenvalue problem are populated with expectation values from quantum circuit evaluations using the Hadamard test, and finally, the eigenvalues of the subspace are determined classically. The subspace can then be gradually increased in size and the process repeated until the result is sufficiently accurate.

The first step is to select a set of reference states from which the subspace is constructed. A benefit for this approach is that it is multireference. A basis of states is defined where each state is a linear combination of Slater determinants:
From this, a non-orthogonal Krylov Space is constructed by applying a family of unitary operators on each of the reference states resulting in elements in the Krylov space where
This constructs the state which is an approximation of the full configuration interaction (FCI) wavefunction.
The selection of the unitary operations is not arbitrary and instead corresponds to consecutive applications of the time evolution operator . This ensures that a basis is chosen which well describes the eigenvalues of .
The energy of this state can be obtained by solving the generalized eigenvalue problem using the state defined above to compute the matrix elements of and .
These elements are obtained using the Hadamard test, which uses phase kickback to estimate an expectation value of the time evolution operators. For a more detailed discussion on why this works, please see this CUDA-Q tutorial. If , the elements of the overlap are
and the elements for the Hamiltonian matrix are
The matrix elements for and are computed with the Hadamard test using the circuit shown below. In the case of the overlap matrix , the Pauli word is the identity, so the drops out.

The term refers to measurement of the expectation value of this circuit with the operator.
Once the and matrices are constructed, the diagonalization is performed classically to produce an estimate for the ground state in question.
Exercise 1
Before implementing the quantum Krylov method, find the ground state energy of the linear molecule and compute the FCI energy by direct matrix diagonalization. This will provide a reference to compare to your Krylov energies. Note that the line saves the Hamiltonian as a NumPy array. How large is the full Hamiltonian matrix?
Implementing a Quantum Krylov Method
Exercise 2
Now you are ready to code up an implementation of the full Krylov method. The following cells will guide you through this with certain "TODO" locations where you can fill in the code. Hint: use the to apply a kernel controlled by a qubit .
Building the main kernel
The heart of the Krylov algorithm is a CUDA-Q kernel (qfd_kernel) that can compute the matrix elements in the circuit shown above, There are a few sub kernels that must be constructed first and these will be explained below. Note that we will prepare a code where the basis vector are single Slater determinants for simplicity. In principle, it could be modified to take any initial state.
kernel
U_mfirst builds the occupation vector of the target slater determinant by applying gates. Then, the commandexp_pauliapplies each Pauli word in the Hamiltonian as a matrix exponentiation to perform time evolution.apply_pauliis a helper kernel provided for you that translates a list of integers to the respective , , and gates for application to measure a specific Pauli word of the Hamiltonian.kernel
U_mfirst builds the occupation vector of the target slater determinant by applying gates. Then, the commandexp_pauliapplies each Pauli word in the Hamiltonian as a matrix exponentiation to perform time evolution followed by controlled application of the usingapply_pauli.Kernel
qfd_kerneldefines a qubit register and an ancilla and uses these three kernels to implement the above circuit. Note, this is general as the code below building the matrix will input
CUDA-Q kernels require specific data types. To apply the Hamiltonian operator within the kernels, it must be broken down into lists of coefficients and their respective Pauli words. Also, types for conditional logic within kernels are limited to integers. So, a function is also needed to convert each Pauli word into a list of integers corresponding to , , , and . The functions below accomplish this and are provided for you.
Note that coefficient = term_coefficients(hamiltonian) and pauli_string = term_words(hamiltonian) must be initialized before building the matrices for each molecular system you test.
Computing the matrix elements
You are now ready to start computing the overlap and Hamiltonian matrix for the subspace. You will write a function to compute each matrix given a time step, number of evolution steps (), and a list of reference states. It is assumed that each reference state will evolve the same number of time steps.
First define the two observables necessary to compute the real and imaginary parts using the Hadamard test. That is, an and operator operating on the ancilla qubit for the real and imaginary parts, respectively. Note, these must be computed as two separate expectation values, not a single !
Next, generate am empty overlap matrix (consisting of elements) and loop over the elements to compute the matrix elements using cudaq.observe. The matrix is symmetric, so for performance reasons, only compute the upper diagonal and copy the results to the corresponding position. For the pauli_string argument in qfd_kernel, use pauli_str to enter a list corresponding to the identity operator, as there is no observable to compute in the overlap matrix. (This is done for you.)
Finally, write a similar loop for the Hamiltonian matrix. In this case, each matrix element needs to be computed by looping over all Pauli words (), computing an expectation value for each word, multiplying this expectation value by that word's coefficient (), and summing the results. In this case, pauli_str will need to be called for each word looped over to produce a list that can be applied as Pauli operators within the kernel.
Solving the eigenvalue problem
Let's recap. You started with classical computation of the Hamiltonian and an initial ground state. You then cleverly defined a subspace to approximate the total Hamiltonian using a basis of vectors formed by evolving the initial state with the Hamiltonian. You then populated the and matrices corresponding to the subspace using quantum computations. Now, you will classically solve this eigenvalue problem to compute the ground state energy of the linear molecule.
The generalized eigenvalue problem is solved by first diagonalizing .
The eigenvectors and eigenvalues are then used to construct a new matrix :
This matrix then diagonalizes as
Using the eigenvectors of which are , one can find the eigenvectors of the original problem by left multiplying by .
The function eigen, imported from an auxiliary script solves the generalized eigenvalue problem for you. The function checks the spread of the eigenvalues of known as the condition number. If this number is large, numerical instabilities may corrupt the solution. This is fixed by projecting out tiny eigenvalues. You can set verbose=True to print information about the conditioning of the matrix if you would like.
Now, compute the ground state energy of using The Hartree Fock state as the only reference, evolving 4 and then 8 time steps. (The first example is provided for running this workflow.) Then recompute using two reference states, HF and a determinant formed by promoting this highest energy electron to the next available orbital and evolve each for 2 and then 4 time steps.
How do your results look? Is a larger better? Are better results obtained from increasing or ? Are they accurate to within chemical accuracy (mHartrees)? Note, you must use fp64 precision with the nvidia simulator otherwise your result will suffer from numerical inaccuracies.
Exercise 3
You have successfully implemented a Krylov method and computed the ground state energy of using a subspace of the full Hamiltonian. To understand the resource savings here, write a function that computes the number of circuits that must be executed to compute the ground state energy of STO-3G LiH which has 631 Hamiltonian terms and a Hilbert space of size . (Assume the expectation values are computed from a single circuit and are not shot based.)
Would a subspace of require execution of more or less circuits than a VQE procedure with 1000 iterations? Also, how much smaller is the subspace than the full Hamiltonian if one were to compute the FCI energy of LiH.
Notice from this, that the subspace requires far fewer circuits. This is somewhat of a tough comparison as it is highly variable how many VQE iterations are needed for each case, or if convergence is even possible. Regardless, Krylov approaches have a much more efficient route to converge on the total energy, yet remain limited by the size of the Hamiltonian which determines how costly each matrix element is to compute.
Selecting an Improved Krylov Basis
As you saw from your initial implementation, the quality of the Krylov QSD result can depend heavily on the reference states used and how many time evolution steps are selected. Examine the table below from the paper. Notice how larger subspaces improve (lower) the ground state energy prediction for the linear molecule (STO-6G).
The table tells a second story. The columns with denote the condition number of the overlap matrix , that is, the ratio of the largest to the smallest eigenvalue. If this number is very large (over ) the solution is potentially numerically unstable and could produce inaccurate and even non-variational predictions. This tends to occur when vectors in the Krylov subspace are too similar.
The table has two columns, one where a single reference (SR) Hartree Fock state is evolved for time steps, and another where multiple reference vectors are evolved for four time steps each. Notice how the MR condition numbers are small and remain while the SR cases are huge. Though potentially OK for a small system, this is highly problematic for larger molecules and indicates the value of using multiple reference vectors over a single reference evolved for more time steps.
| Subspace Size | E(SR) | E(MR) | ||
|---|---|---|---|---|
| 4 | −3.015510 | −3.015510 | ||
| 8 | −3.019768 | −3.019301 | ||
| 12 | −3.020172 | −3.019696 | ||
| 16 | −3.020192 | −3.019835 | ||
| 20 | −3.020198 | −3.019929 |
A natural question is what other ways can improve the construction of the Krylov subspace? The rest of this section will explore two more ways.
Exercise 4
One way to improve the subspace selection of an appropriate time evolution step. Use your code to test the following time steps with the 6-31G Hartree Fock state of (so it runs faster) as a reference time-evolved for three timesteps. Comment on the ground state energy predictions. Is there a systematic trend? What might be an explanation for what makes a good vs a poor selection?
Exercise 5
Another way to improve the basis selection has to do with how the time evolution operator is approximated. Consider the Hamiltonian where are the coefficients and are the Pauli words. We have used the function to apply each Pauli word individually and approximate . This is because noncommuniting terms need to be applied using the Suzuki-Trotter approximation shown below.
where the approximation converges to the exact when goes to infinity. So far, we have used a first order Trotter-Suzuki Operator (), but this can be increased to provide an even more accurate time evolution operator. This is implemented by applying the time evolution operators within and times where the coefficients are divided by a factor of . Update your code so that can be specified and run the procedure for using the HF reference and three total time steps with . Also, time each computation and print the runtime. Do the higher order approximations produce better results? How much slower is each increase in approximation order?
Clearly there is a benefit to using a higher Trotter order, but there is also a cost. Each additional increase in doubles the circuit depth of the time evolution operator resulting in many more gates to apply and longer circuit simulation times. In A Multireference Quantum Krylov Algorithm for Strongly Correlated Electrons, the authors present this table of data corresponding to with various Trotter orders (they use instead of ). Notice that for most bond distances, chemical accuracy is only achieved when a Trotter order of 8 is used, demonstrating the importance of an accurate time evolution operator.

Parallel Krylov Method
In the previous section you calculated that for even a modest sized molecule like , over 26,000 circuits need to run to compute all of the expectation values. This does not factor in sampling on a physical QPU which would significantly increase this. The good news is that the Krylov approach is highly parallelizable, and, in theory, all of these circuits could be run asynchronously on a sufficient number of physical QPUs. Accelerated quantum supercomputers will eventually include multiple QPUs in a single data center and allow such computations to occur alongside supercomputers.
CUDA-Q has a number of features so you can easily code parallel applications and simulate how they would run on multiple QPUs using the MQPU backend. In this section you will implement a parallel version of the Krylov approach and test it out on a larger molecule.
Exercise 6
Though there are multiple ways to parallelize the population of the and matrices, the limiting factor is the number of QPUs available (or GPUs in our simulated case.) Most users probably have a limited number of GPUs, so we will only consider parallelization across 2 GPUs.
First, consider the function . Each matrix element requires computation of an expectation value for every Pauli word in the Hamiltonian. We will focus on parallelizing computation of the real and imaginary parts of each of these. Below is an updated version of the function called populate_h_async which now takes advantage of the function to run each call on a different QPU. The now has a final parameter called which specifies the correct GPU to run that computation on.
The results are stored in a so-called futures list which can be converted to a float by running on each element. Run the code below and see how much faster the parallel version is. Notice that it is not twice as fast because there is overhead for the parallelization and computing is just part of the workflow. As a challenge, learners are encouraged to try other parallelization schemes that might take advantage of more than two QPUs.
Now, try to parallelize populate_s. You will use similar syntax, but will need to adjust the loops differently to store the futures and access them later. There is more than one way to do this so feel free to be creative. Test your function and confirm it is faster than using populate_s.
It is expected the benefit of parallelizing computation of will be even smaller than , but the exercise is nevertheless helpful.
The benefit of parallelization is most noticable with large problem sizes where computation of and are much more challenging.
Conclusion and Extensions
The Krylov QSD approach is a powerful and versatile tool. This lab has introduced the underlying theory that allowed you to implement your own version and explore several ways to improve results and accelerate the algorithm with parallelization.
The hybrid nature of the Krylov approach is excellent for balancing the need for quantum computers to evaluate expectation values from large Hibert spaces, but do so in a way that a subspace matrix can be constructed and solved classically. A very promising aspect of Krylov methods is the fact that they can be combined with other quantum computing approaches. For example, it is perfectly valid to prepare the reference state and then run Krylov QSD to refine the prediction. This further creates a robust hybrid workflow that builds upon other quantum techniques.
For a challenge, if you have completed the ADAPT-VQE CUDA-Q Academic lab, see if you can combine these two methods into a single workflow.