Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
NVIDIA
GitHub Repository: NVIDIA/cuda-q-academic
Path: blob/main/chemistry-simulations/krylov_subspace_diagonalization.ipynb
1133 views
Kernel: Python 3 (ipykernel)
# SPDX-License-Identifier: Apache-2.0 AND CC-BY-NC-4.0 # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License.

Krylov Quantum Subspace Diagonalization

The Krylov quantum subspace diagonalization method is a powerful hybrid quantum chemistry method for estimating ground state energies, taking advantage of quantum computers to build a matrix and using a classical supercomputer to diagonalize that matrix. The method has advantages over variational approaches and is a promising technique for early fault tolerant quantum computing. This lab will introduce the Krylov method, guide you through implementing it in CUDA-Q, teach you how to improve results, and parallelize computations across multiple QPUs.

What you will do:

  • Learn the theory behind Krylov quantum subspace diagonalization

  • Implement a Krylov solver in CUDA-Q

  • Explore methods to improve your solver such as higher order Trotter approximations and better reference states

  • Parallelize your implementation to run on multiple simulated QPUs

CUDA-Q Syntax you will use:

  • methods for CUDA-Q kernel construction

  • observe and get_state to evaluate kernels

  • The mqpu backend, the associated observe_async and get_state_async functions, and get.

  • exp_pauli to dynamically construct quantum circuits from CUDA-Q Spin Operators

Prerequisites: This lab assumes you have a basic understanding of quantum algorithms and understand the basic concepts of quantum computing. If you need a refresher, check out the "Quick Start to Quantum Computing with CUDA-Q" series to learn how to code a VQE implementation with CUDA-Q from scratch. This lab focuses on a chemistry application, so some knowledge of chemistry terminology is necessary to fully appreciate the exercises.

Run the following code to prepare the packages you will use in this lesson.

💻 Just a heads-up: This notebook is designed to be run on an environment with a GPU. If you don't have access to a GPU, feel free to read through the cells and explore the content without executing them. Enjoy learning! ⭐

import cudaq import numpy as np import scipy import time %pip install pyscf -q from aux_files.krylov.qchem.classical_pyscf import get_mol_hamiltonian from aux_files.krylov.qchem.hamiltonian import jordan_wigner_fermion from aux_files.krylov.qchem.eigenvaluesolver import eigen # Single-node, single gpu cudaq.set_target("nvidia", option = 'fp64')

Quantum Subspace Diagonalization

Quantum computing methods for quantum chemistry lie on a spectrum from NISQ methods to methods suited for large-scale fault tolerant quantum computers. The problem that we'll be considering in this notebook is to find the ground state energy of a Hamiltonian. Mathematically, this is equivalent to finding the lowest eigenvalue of a given matrix.

The prototypical NISQ algorithm is the Variational Quantum Eigensolver (VQE) which computes ground state energies by classically optimizing a parameterized circuit and computing expectation values for the cost function. These circuits are shallow which is a favorable trait, but the number of measurements needed each round scales O(n4n^4) with the size of the Hamiltonian. Most problematic is the fact that circuits corresponding to even modest qubit counts have extremely flat optimization surfaces and tend to not converge which is the so called "barren plateau" problem. Even when convergence is possible, the number of iterations required may become intractably large.

On the other side of the spectrum, quantum phase estimation (QPE) is hallmark fault tolerant algorithm for quantum chemistry. Given a good initial state, QPE can compute ground state energies with exponential precision, but requires extremely deep circuits due to the necessity of the quantum Fourier transform in most implementations. QPE is extremely sensitive to errors, requiring very low error rates to function. Consequently, it relies on quantum error correction protocols (Learn more in our QEC 101 course), which are resource-intensive and demand thousands to millions of physical qubits.

Quantum Krylov methods are a bridge between the two and balance tradeoffs to potentially result in useful applications on devices that are more near term. The main idea is to reduce the size of the matrix to a smaller one and compute the eigenvalues of this smaller matrix. If chosen well, the lowest eigenvalues of the smaller matrix and the larger one agree. The Krylov approach is a subset of broader quantum subspace diagonalization (QSD) methods which try to diagonalize the Hamiltonian in a general non-orthogonal basis. In other words, given an initial state that is close to the ground state, the Krylov method will help identify a good subspace of the Hamiltonian to diagonalize such that the best possible estimates of the extreme eigenvalues (i.e., the ground state) are produced.

Such an approach requires circuits of moderate depth and uses them to populate matrix elements of a Hamiltonian subspace which is then classically diagonalized. Krylov methods still suffer from the O(n4n^4) Hamiltonian scaling as the full Hamiltonian must be measured to compute subspace matrix elements. However, increasing the subspace dimension exponentially leads to convergence towards the ground state energy, which could outperform the many VQE iterations required for larger problems.

The widget below will provide some intuition for subspace diagonalization and why Krylov methods work. Given the 8x8 matrix below, move the slider to select a subspace dimension and look at the computed eigenvalues. Notice the ground state energy converges to the exact ground state while the other eigenvalues remain rather inaccurate. Subspace methods are quite good at estimating the extreme eigenvalues (highest or lowest) which is favorable as the lowest eigenvalue (the ground state energy) is usually the quantity of interest.

Click this link to access the widget

To get a sense for why a good initial state is important (more on that later), choose which corner the subspace originates from in the widget. Notice, the approximate eigenvalues are much worse when the subspace is constructed from certain corners which happen to capture less information about the eigen spectrum of the matrix.

Similarly, in the quantum Krylov methods, a good approximation of the ground state will help construct a better subspace for estimating the ground state energy.

The Quantum Krylov Method

This section will walk through an implementation of a quantum Krylov method based on A Multireference Quantum Krylov Algorithm for Strongly Correlated Electrons. The general workflow follows the figure below inspired by Quantum Krylov subspace algorithms for ground and excited state energy estimation. The first step is to build a subspace for which a generalized eigenvalue problem will be solved. Then, the matrices for the eigenvalue problem are populated with expectation values from quantum circuit evaluations using the Hadamard test, and finally, the eigenvalues of the subspace are determined classically. The subspace can then be gradually increased in size and the process repeated until the result is sufficiently accurate.

The first step is to select a set of reference states from which the subspace is constructed. A benefit for this approach is that it is multireference. A basis of dd states Φ0Φd{\Phi_0 \cdots \Phi_d} is defined where each state is a linear combination of Slater determinants:

ΦI=μdμIϕμ.\ket{\Phi_I} = \sum_{\mu} d_{\mu I}\ket{\phi_{\mu}}.

From this, a non-orthogonal Krylov Space K={ψ0ψN}\mathcal{K} = \{\psi_{0} \cdots \psi_{N}\} is constructed by applying a family of ss unitary operators on each of the dd reference states resulting in ds=Nd*s = N elements in the Krylov space where ψαψI(n)=U^nΦI, \ket{\psi_{\alpha}} \equiv \ket{\psi_I^{(n)}} = \hat{U}_n\ket{\Phi_I},

This constructs the state Ψ \ket{\Psi} which is an approximation of the full configuration interaction (FCI) wavefunction.

Ψ=αcαψα=I=0dn=0scI(n)U^nΦI.\ket{\Psi} = \sum_{\alpha} c_{\alpha}\ket{\psi_{\alpha}} = \sum_{I=0}^d \sum_{n=0}^s c_I^{(n)}\hat{U}_n\ket{\Phi_I}.

The selection of the unitary operations is not arbitrary and instead corresponds to consecutive applications of the time evolution operator Un=eiHtnU_n = e^{-iHt_n}. This ensures that a basis is chosen which well describes the eigenvalues of HH.

The energy of this state can be obtained by solving the generalized eigenvalue problem using the state defined above to compute the matrix elements of HH and SS.

Hc=ScE,\boldsymbol{Hc}=\boldsymbol{Sc}E,

These elements are obtained using the Hadamard test, which uses phase kickback to estimate an expectation value of the time evolution operators. For a more detailed discussion on why this works, please see this CUDA-Q tutorial. If H=lclPlH = \sum_l c_lP_l, the elements of the overlap are Sαβ=ψαψβ=ΦIU^mU^nΦJ S_{\alpha \beta} = \braket{\psi_{\alpha}|\psi_{\beta}} = \braket{\Phi_I\hat{U}_m^{\dagger}\hat{U}_n|\Phi_J}

and the elements for the Hamiltonian matrix are

Hαβ=ψαH^ψβ=ΦIU^mH^U^nΦJ=lclΦIU^mPl^U^nΦJ.H_{\alpha \beta} = \braket{\psi_{\alpha}|\hat{H}|\psi_{\beta}} = \braket{\Phi_I|\hat{U}_m^{\dagger}\hat{H}\hat{U}_n|\Phi_J} = \sum_l c_l \braket{\Phi_I|\hat{U}_m^{\dagger}\hat{P_l}\hat{U}_n|\Phi_J}.

The matrix elements for SS and HH are computed with the Hadamard test using the circuit shown below. In the case of the overlap matrix SS, the Pauli word is the identity, so the PlP_l drops out.

The 2σ+2\sigma_+ term refers to measurement of the expectation value of this circuit with the X+iYX+iY operator.

Once the HH and SS matrices are constructed, the diagonalization is performed classically to produce an estimate for the ground state in question.

Exercise 1

Before implementing the quantum Krylov method, find the ground state energy of the linear H4H_4 molecule and compute the FCI energy by direct matrix diagonalization. This will provide a reference to compare to your Krylov energies. Note that the spin_ham_matrix = hamiltonian.to_matrix\texttt{spin\_ham\_matrix = hamiltonian.to\_matrix$$} line saves the Hamiltonian as a NumPy array. How large is the full Hamiltonian matrix?

#geometry = 'Li 0.3925 0.0 0.0; H -1.1774 0.0 0.0' #geometry = 'H 0.0 0.0 0.0; H 0.0 0.0 0.7474' geometry = 'H 0.0 0.0 1.5; H 0.0 0.0 3,0; H 0.0 0.0 4.5; H 0.0 0.0 6.0' molecular_data = get_mol_hamiltonian(xyz=geometry, spin=0, charge=0, basis='sto3g', ccsd=True, verbose=True) obi = molecular_data[0] tbi = molecular_data[1] e_nn = molecular_data[2] nelectrons = molecular_data[3] norbitals = molecular_data[4] qubits_num = 2 * norbitals hamiltonian = jordan_wigner_fermion(obi, tbi, e_nn, tolerance = 1e-12) spin_ham_matrix = hamiltonian.to_matrix() print("H is a", len(spin_ham_matrix), "by", len(spin_ham_matrix) , "matrix") #TODO - Compute FCI energy with matrix diagonalization e, c = #FIXME # Find the ground state energy and the corresponding eigenvector print('Ground state energy (classical simulation)= ', np.min(e), ', index= ', np.argmin(e)) min_indices = np.argsort(e)[:5] # Eigenvector can be used to initialize the qubits vec = c[:, min_indices[0]]
output file: H 0-pyscf.log [pyscf] Total number of orbitals = 4 [pyscf] Total number of electrons = 4 [pyscf] HF energy = -1.829137412352686 [pyscf] Total R-CCSD energy = -1.9976240454057859 H is a 256 by 256 matrix Ground state energy (classical simulation)= (-1.996150325518812+0j) , index= 13

Implementing a Quantum Krylov Method

Exercise 2

Now you are ready to code up an implementation of the full Krylov method. The following cells will guide you through this with certain "TODO" locations where you can fill in the code. Hint: use the cudaq.control(U, a, ...U-args)\texttt{cudaq.control(U, a, ...U-args)} to apply a kernel U\texttt{U} controlled by a qubit a\texttt{a}.

Building the main kernel

The heart of the Krylov algorithm is a CUDA-Q kernel (qfd_kernel) that can compute the matrix elements in the circuit shown above, There are a few sub kernels that must be constructed first and these will be explained below. Note that we will prepare a code where the basis vector are single Slater determinants for simplicity. In principle, it could be modified to take any initial state.

  1. kernel U_m first builds the occupation vector of the target slater determinant ΦI\ket{\Phi_I} by applying XX gates. Then, the command exp_pauli applies each Pauli word in the Hamiltonian as a matrix exponentiation to perform time evolution.

  2. apply_pauli is a helper kernel provided for you that translates a list of integers to the respective XX, YY, and ZZ gates for application to measure a specific Pauli word of the Hamiltonian.

  3. kernel U_m first builds the occupation vector of the target slater determinant ΦJ\ket{\Phi_J} by applying XX gates. Then, the command exp_pauli applies each Pauli word in the Hamiltonian as a matrix exponentiation to perform time evolution followed by controlled application of the PiP_i using apply_pauli.

  4. Kernel qfd_kernel defines a qubit register and an ancilla and uses these three kernels to implement the above circuit. Note, this is general as the code below building the SS matrix will input Pl=IIIIP_l = II\cdots II

# TODO - write kernel U_m @cudaq.kernel def U_m(qubits: cudaq.qview, dt: float, coefficients: list[complex], words: list[cudaq.pauli_word], initial_state: list[int], trotter_order: int): """ Function to prepare |Φ_I⟩ and time evolve Parameters ---------- qubits : cudaq.qview Quantum register on which the kernel acts (data qubits only; ancillary qubits are handled outside this routine). dt : float Time step for the evolution operator. coefficients : list[complex] Scalar coefficients of the Pauli words in the Hamiltonian words : list[cudaq.pauli_word] The corresponding Pauli words of H in cuda-quantum’s native representation. initial_state : list[int] Indices of qubits that are occupied in the reference determinant|Φ_I⟩ trotter_order : int Order of the Trotter–Suzuki decomposition """ # TODO - Write the kernel @cudaq.kernel def apply_pauli(qubits: cudaq.qview, word: list[int]): """ Takes a single Pauli word and applies the individual Pauli operators. Parameters ---------- qubits : cudaq.qview Quantum register on which the kernel acts (data qubits only; ancillary qubits are handled outside this routine). word : list[int] list of integers of 1,2 or 3, corresponding to application of X, Y, Z gates """ # Add H (Hamiltonian operator) for i in range(len(word)): if word[i] == 1: x(qubits[i]) if word[i] == 2: y(qubits[i]) if word[i] == 3: z(qubits[i]) # Applies Unitary operation corresponding to HU_n @cudaq.kernel def U_n(qubits: cudaq.qview, dt: float, coefficients: list[complex], words: list[cudaq.pauli_word], initial_state: list[int], word_list: list[int], trotter_order: int): """ Function to prepare |Φ_J> and time evolve and apply P_l Parameters ---------- qubits : cudaq.qview Quantum register on which the kernel acts (data qubits only; ancillary qubits are handled outside this routine). dt : float Real time step for the evolution operator. coefficients : list[complex] Scalar coefficients of the Pauli words in the Hamiltonian words : list[cudaq.pauli_word] The corresponding Pauli words of H in cuda-quantum’s native representation. initial_state : list[int] Indices of qubits that are occupied in the reference determinant|Φ_I⟩ word_list : list[int] list of integers of 1,2 or 3, corresponding to application of X, Y, Z gates trotter_order : int Order of the Trotter–Suzuki decomposition """ # TODO - write the kernel @cudaq.kernel def qfd_kernel(dt_alpha: float, dt_beta: float, coefficients: list[complex], words: list[cudaq.pauli_word], word_list: list[int], state_alpha: list[int], state_beta: list[int], n_electrons: int, n_qubits: int, trotter_order:int): """ Kernel to build circuit which is used to compute matrix elements of H and S. Parameters ---------- dt_alpha : float Time evolutoion step for U_m dt_beta : float Time evolutoion step for U_m coefficients : list[complex] Scalar coefficients of the Pauli words in the Hamiltonian words : list[cudaq.pauli_word] The corresponding Pauli words of H in cuda-quantum’s native representation. word_list : list[int] list of integers of 1,2 or 3, corresponding to application of X, Y, Z gates state_alpha: list[int] Indices of qubits that are occupied in the reference determinant|Φ_I⟩ state_beta: list[int] Indices of qubits that are occupied in the reference determinant|Φ_J⟩ n_electrons: int number of electrons in the molecular system n_qubits: int Number of qubits = 2*n_orbitals trotter_order : int Order of the Trotter–Suzuki decomposition """ #TODO - write the kernel

CUDA-Q kernels require specific data types. To apply the Hamiltonian operator within the kernels, it must be broken down into lists of coefficients and their respective Pauli words. Also, types for conditional logic within kernels are limited to integers. So, a function is also needed to convert each Pauli word into a list of integers corresponding to II, XX, YY, and ZZ. The functions below accomplish this and are provided for you.

Note that coefficient = term_coefficients(hamiltonian) and pauli_string = term_words(hamiltonian) must be initialized before building the matrices for each molecular system you test.

# Collect coefficients from a spin operator def term_coefficients(ham: cudaq.SpinOperator) -> list[complex]: """ Function to extract Hamiltonian coefficients Parameters ---------- ham : cudaq.SpinOperator Hamiltonian as a CUDA-Q spin operator Returns -------- list(complex) list of Hamiltonian term coefficients """ #TODO- write the function # Collect Pauli words from a spin operator def term_words(ham: cudaq.SpinOperator) -> list[str]: """ Finction to collect Pauli words from the Hamiltonian Parameters ---------- ham : cudaq.SpinOperator Hamiltonian as a CUDA-Q spin operator Returns -------- list(str) list of Hamiltonian terms """ #TODO - write the function def pauli_str(pauli_word, qubits_num): """ Finction to take a Pauli word and convert it to integer representation Parameters ---------- pauli_word : cudaq.SpinOperator a single Pauli word qubit_num: int number of qubits in the circuit Returns -------- list(int) list of integers corresponding to Pauli operations """ my_list = [] for i in range(qubits_num): if str(pauli_word[i]) == 'I': my_list.append(0) if str(pauli_word[i]) == 'X': my_list.append(1) if str(pauli_word[i]) == 'Y': my_list.append(2) if str(pauli_word[i]) == 'Z': my_list.append(3) return my_list # Build the lists of coefficients and Pauli Words from the H4 Hamiltonian coefficient = term_coefficients(hamiltonian) pauli_string = term_words(hamiltonian) print(coefficient[0:10]) print(pauli_string[0:10]) print(pauli_str(pauli_string[0], qubits_num))
[(-0.9209431016975833+0j), (0.11933984698934136+0j), (0.11933984698934136+0j), (0.07128066964853524+0j), (0.07128066964853522+0j), (-0.006895599406554084+0j), (-0.006895599406554112+0j), (-0.10062378739478249+0j), (-0.10062378739478246+0j), (0.10125845891307164+0j)] ['IIIIIIII', 'ZIIIIIII', 'IZIIIIII', 'IIZIIIII', 'IIIZIIII', 'IIIIZIII', 'IIIIIZII', 'IIIIIIZI', 'IIIIIIIZ', 'ZZIIIIII'] [0, 0, 0, 0, 0, 0, 0, 0]

Computing the matrix elements

You are now ready to start computing the overlap SS and Hamiltonian matrix HH for the subspace. You will write a function to compute each matrix given a time step, number of evolution steps (ss), and a list of reference states. It is assumed that each reference state will evolve the same number of time steps.

First define the two observables necessary to compute the real and imaginary parts using the Hadamard test. That is, an XX and YY operator operating on the ancilla qubit for the real and imaginary parts, respectively. Note, these must be computed as two separate expectation values, not a single XYXY!

Next, generate am empty overlap matrix (consisting of sd×sds*d \times s*d elements) and loop over the elements to compute the matrix elements using cudaq.observe. The matrix is symmetric, so for performance reasons, only compute the upper diagonal and copy the results to the corresponding position. For the pauli_string argument in qfd_kernel, use pauli_str to enter a list corresponding to the identity operator, as there is no observable to compute in the overlap matrix. (This is done for you.)

#TODO - Define the two observables which compute the real and imaginary part of the matrix element x_0 = #FIXME y_0 = #FIXME def populate_s(dt, n_steps, ref_states, trotter_order=1): """ Function to compute the overlap matrix S with the Hadamard test Parameters ---------- dt : float Time evolution time step n_steps: int Number of time steps each reference vector is evolved ref_states: list[list[int]] Lits of lists of indices to prepare occupation vectors for reference slater determinants trotter_order : int Order of the Trotter–Suzuki decomposition Returns -------- S: np.array(complex) Returns the overlap matrix """ identity_op = cudaq.SpinOperator.from_word('I' * qubits_num) identity_word = identity_op.get_pauli_word() pauli_list = pauli_str(identity_word, qubits_num) dt_s = [i*dt for i in range(n_steps)] * len(ref_states) states = ref_states * (n_steps) # Empty overlap matrix S wf_overlap = np.zeros((n_steps * len(ref_states), n_steps * len(ref_states)), dtype=complex) #TODO - write the loop to solve for S matrix elements return wf_overlap

Finally, write a similar loop for the Hamiltonian matrix. In this case, each matrix element needs to be computed by looping over all Pauli words (PlP_l), computing an expectation value for each word, multiplying this expectation value by that word's coefficient (clc_l), and summing the results. In this case, pauli_str will need to be called for each word looped over to produce a list that can be applied as Pauli operators within the kernel.

def populate_h(dt, n_steps, ref_states, trotter_order=1): """ Function to compute the Hamiltonian matrix S with the Hadamard test Parameters ---------- dt : float Time evolution time step n_steps: int Number of time steps each reference vector is evolved ref_states: list[list[int]] Lits of lists of indices to prepare occupation vectors for reference slater determinants trotter_order : int Order of the Trotter–Suzuki decomposition Returns -------- H: np.array(complex) Returns the Hamiltonian matrix """ ham_matrix = np.zeros((n_steps * len(ref_states),n_steps * len(ref_states)), dtype=complex) dt_s = [i*dt for i in range(n_steps)] * len(ref_states) states = ref_states * (n_steps) #TODO write a loop to compute H return ham_matrix

Solving the eigenvalue problem

Let's recap. You started with classical computation of the Hamiltonian and an initial ground state. You then cleverly defined a subspace to approximate the total Hamiltonian using a basis of vectors formed by evolving the initial state with the Hamiltonian. You then populated the HH and SS matrices corresponding to the subspace using quantum computations. Now, you will classically solve this eigenvalue problem to compute the ground state energy of the linear H4H_4 molecule.

The generalized eigenvalue problem is solved by first diagonalizing SS.

S=USUS = U S' U^{\dagger}

The eigenvectors vv and eigenvalues ee are then used to construct a new matrix XX':

X=S1/2=kvki1ekvkjX' = S^{-1/2} = \sum_k v_{ki} \frac{1}{\sqrt{e_k}}v_{kj}

This matrix then diagonalizes HH as

XHX=ES1/2CX'^{\dagger} H X' = ES^{1/2}C

Using the eigenvectors of HH which are S1/2CS^{1/2}C, one can find the eigenvectors of the original problem by left multiplying by S1/2S^{-1/2}.

The function eigen, imported from an auxiliary script solves the generalized eigenvalue problem for you. The function checks the spread of the eigenvalues of SS known as the condition number. If this number is large, numerical instabilities may corrupt the solution. This is fixed by projecting out tiny eigenvalues. You can set verbose=True to print information about the conditioning of the matrix if you would like.

Now, compute the ground state energy of H4H_4 using The Hartree Fock state as the only reference, evolving 4 and then 8 time steps. (The first example is provided for running this workflow.) Then recompute using two reference states, HF and a determinant formed by promoting this highest energy electron to the next available orbital and evolve each for 2 and then 4 time steps.

How do your results look? Is a larger N=sdN=s*d better? Are better results obtained from increasing dd or ss? Are they accurate to within chemical accuracy (mHartrees)? Note, you must use fp64 precision with the nvidia simulator otherwise your result will suffer from numerical inaccuracies.

timesteps = 4 ref_states = [[1,2,3,4]] dt= 1.0 H_final = populate_h(dt, timesteps, ref_states) S_final = populate_s(dt, timesteps, ref_states) eigen_value, eigen_vect = eigen(H_final, S_final) print('Energy from QFD s = 4, d = 1:') print(np.min(eigen_value)) #TODO - Use the Hartree Fock state evolved 8 time steps timesteps = #FIXME ref_states = #FIXME dt= 1.0 H_final = populate_h(dt, timesteps, ref_states) S_final = populate_s(dt, timesteps, ref_states) eigen_value, eigen_vect = eigen(H_final, S_final) print('Energy from QFD s = 8, d = 1:') print(np.min(eigen_value)) #TODO - Use the Hartree Fock state and an excited state Slater Determinant, both evolved 2 steps. timesteps = #FIXME ref_states = #FIXME dt= 0.001 H_final = populate_h(dt, timesteps, ref_states) S_final = populate_s(dt, timesteps, ref_states) eigen_value, eigen_vect = eigen(H_final, S_final) print('Energy from QFD s = 2, d = 2:') print(np.min(eigen_value)) #TODO - Use the Hartree Fock state and an excited state Slater Determinant, both evolved 4 steps. timesteps = #FIXME ref_states = #FIXME dt= 0.001 H_final = populate_h(dt, timesteps, ref_states) S_final = populate_s(dt, timesteps, ref_states) eigen_value, eigen_vect = eigen(H_final, S_final) print('Energy from QFD s = 4, d = 2:') print(np.min(eigen_value)) print('Exact Ground State Energy):') print(np.min(e))
Energy from QFD s = 4, d = 1: -1.7924051988640382 Energy from QFD s = 8, d = 1: -1.8193961793720168 Energy from QFD s = 2, d = 2: -1.829137412352687 Energy from QFD s = 4, d = 2: -1.9660837421691038 Exact Ground State Energy): (-1.996150325518812+0j)

Exercise 3

You have successfully implemented a Krylov method and computed the ground state energy of H4H_4 using a subspace of the full Hamiltonian. To understand the resource savings here, write a function that computes the number of circuits that must be executed to compute the ground state energy of STO-3G LiH which has 631 Hamiltonian terms and a Hilbert space of size 2122^{12}. (Assume the expectation values are computed from a single circuit and are not shot based.)

Would a subspace of d=2,s=3d=2, s=3 require execution of more or less circuits than a VQE procedure with 1000 iterations? Also, how much smaller is the subspace than the full Hamiltonian if one were to compute the FCI energy of LiH.

def krylov_circuits(m, ham_terms): #TODO - write code to answer the questions
Circuits for Krylov Method: 26544.0 circuits for 1000 VQE steps 631000 Subspace is 64.0 times smaller than full Hilbert space

Notice from this, that the N=6N=6 subspace requires far fewer circuits. This is somewhat of a tough comparison as it is highly variable how many VQE iterations are needed for each case, or if convergence is even possible. Regardless, Krylov approaches have a much more efficient route to converge on the total energy, yet remain limited by the size of the Hamiltonian which determines how costly each matrix element is to compute.

Selecting an Improved Krylov Basis

As you saw from your initial implementation, the quality of the Krylov QSD result can depend heavily on the reference states used and how many time evolution steps are selected. Examine the table below from the paper. Notice how larger subspaces improve (lower) the ground state energy prediction for the linear H6H_6 molecule (STO-6G).

The table tells a second story. The columns with k()k() denote the condition number of the overlap matrix SS, that is, the ratio of the largest to the smallest eigenvalue. If this number is very large (over 101210^{12}) the solution is potentially numerically unstable and could produce inaccurate and even non-variational predictions. This tends to occur when vectors in the Krylov subspace are too similar.

The table has two columns, one where a single reference (SR) Hartree Fock state is evolved for NN time steps, and another where multiple reference vectors d=N/4d = N/4 are evolved for four time steps each. Notice how the MR condition numbers are small and remain O(106)O(10^6) while the SR cases are huge. Though potentially OK for a small system, this is highly problematic for larger molecules and indicates the value of using multiple reference vectors over a single reference evolved for more time steps.

Subspace SizeE(SR)k(SR)k(\text{SR})E(MR)k(MR)k(\text{MR})
4−3.0155103.29×1053.29\times10^{5}−3.0155103.29×1053.29\times10^{5}
8−3.0197683.60×10113.60\times10^{11}−3.0193014.86×1054.86\times10^{5}
12−3.0201721.61×10171.61\times10^{17}−3.0196969.39×1059.39\times10^{5}
16−3.0201923.19×10173.19\times10^{17}−3.0198355.68×1065.68\times10^{6}
20−3.0201983.86×10173.86\times10^{17}−3.0199296.23×1066.23\times10^{6}

A natural question is what other ways can improve the construction of the Krylov subspace? The rest of this section will explore two more ways.

Exercise 4

One way to improve the subspace selection of an appropriate time evolution step. Use your code to test the following time steps {10,1,...0.00001}\{10, 1, ... 0.00001\} with the 6-31G Hartree Fock state of H2H_2 (so it runs faster) as a reference time-evolved for three timesteps. Comment on the ground state energy predictions. Is there a systematic trend? What might be an explanation for what makes a good vs a poor dtdt selection?

geometry = 'H 0.0 0.0 0.0; H 0.0 0.0 0.7474' molecular_data = get_mol_hamiltonian(xyz=geometry, spin=0, charge=0, basis='631g', ccsd=True, verbose=False) obi = molecular_data[0] tbi = molecular_data[1] e_nn = molecular_data[2] nelectrons = molecular_data[3] norbitals = molecular_data[4] qubits_num = 2 * norbitals hamiltonian = jordan_wigner_fermion(obi, tbi, e_nn, tolerance = 1e-12) spin_ham_matrix = hamiltonian.to_matrix() e, c = np.linalg.eig(spin_ham_matrix) # Build the lists of coefficients and Pauli Words from the H2 Hamiltonian coefficient = term_coefficients(hamiltonian) pauli_string = term_words(hamiltonian) timesteps = 3 ref_states = [[0,1]] for dt in [10, 1.0, 0.1, 0.01, 0.001, 0.0001, 0.00001]: H_final = populate_h(dt, timesteps, ref_states) S_final = populate_s(dt, timesteps, ref_states) eigen_value, eigen_vect = eigen(H_final, S_final) print("The time step dt=", dt, "results in E=", np.min(eigen_value)) print('Exact Ground State Energy):') print(np.min(e))
overwrite output file: H 0-pyscf.log The time step dt= 10 results in E= -1.137175710240685 The time step dt= 1.0 results in E= -1.137175710240685 The time step dt= 0.1 results in E= -1.137175710240694 The time step dt= 0.01 results in E= -1.1371757102409101 The time step dt= 0.001 results in E= -1.1371757101747133 The time step dt= 0.0001 results in E= -1.1371757047893722 The time step dt= 1e-05 results in E= -1.1163255644809635 Exact Ground State Energy): (-1.1371757102406845+0j)

Exercise 5

Another way to improve the basis selection has to do with how the time evolution operator is approximated. Consider the Hamiltonian H=clPlH = \sum c_lP_l where clc_l are the coefficients and PlP_l are the Pauli words. We have used the exp_pauli\texttt{exp\_pauli} function to apply each Pauli word individually and approximate eiHte^{-iHt}. This is because noncommuniting terms need to be applied using the Suzuki-Trotter approximation shown below.

ei(c0P0+c1P1+clPl)t(eic0P0t/keic1P1t/keiclPlt/k)k, e^{-i(c_0P_0 + c_1P_1 + \cdots c_lP_l)t} \approx (e^{-ic_0P_0t/k}e^{-ic_1P_1t/k}\cdots e^{-ic_lP_lt/k})^k,

where the approximation converges to the exact when kk goes to infinity. So far, we have used a first order Trotter-Suzuki Operator (k=1k=1), but this can be increased to provide an even more accurate time evolution operator. This is implemented by applying the time evolution operators within UnU_n and UmU_m kk times where the coefficients are divided by a factor of kk. Update your code so that kk can be specified and run the procedure for H4H_4 using the HF reference and three total time steps with k={1,2,3,4,5}k = \{1,2,3,4,5\}. Also, time each computation and print the runtime. Do the higher order approximations produce better results? How much slower is each increase in approximation order?

import time geometry = 'H 0.0 0.0 1.5; H 0.0 0.0 3,0; H 0.0 0.0 4.5; H 0.0 0.0 6.0' molecular_data = get_mol_hamiltonian(xyz=geometry, spin=0, charge=0, basis='sto3g', ccsd=True, verbose=False) obi = molecular_data[0] tbi = molecular_data[1] e_nn = molecular_data[2] nelectrons = molecular_data[3] norbitals = molecular_data[4] qubits_num = 2 * norbitals hamiltonian = jordan_wigner_fermion(obi, tbi, e_nn, tolerance = 1e-12) spin_ham_matrix = hamiltonian.to_matrix() e, c = np.linalg.eig(spin_ham_matrix) coefficient = term_coefficients(hamiltonian) pauli_string = term_words(hamiltonian) timesteps = 3 ref_states = [[0,1,2,3]] dt = 0.1 for k in [1,2,3,4,5]: start = time.time() H_final = populate_h(dt, timesteps, ref_states, k) S_final = populate_s(dt, timesteps, ref_states,k) eigen_value, eigen_vect = eigen(H_final, S_final) end = time.time() print("The Trotter order k=", k, "results in E=", np.min(eigen_value), "with time:", end - start) print('Exact Ground State Energy):') print(np.min(e))
overwrite output file: H 0-pyscf.log The Trotter order k= 1 results in E= -1.9860932225754904 with time: 16.321036338806152 The Trotter order k= 2 results in E= -1.9913724123384422 with time: 29.278184175491333 The Trotter order k= 3 results in E= -1.992358540086284 with time: 42.292258739471436 The Trotter order k= 4 results in E= -1.9926827227889903 with time: 55.2967255115509 The Trotter order k= 5 results in E= -1.9928241489048735 with time: 68.2799425125122 Exact Ground State Energy): (-1.9961503255188189+0j)

Clearly there is a benefit to using a higher Trotter order, but there is also a cost. Each additional increase in kk doubles the circuit depth of the time evolution operator resulting in many more gates to apply and longer circuit simulation times. In A Multireference Quantum Krylov Algorithm for Strongly Correlated Electrons, the authors present this table of data corresponding to BeH2BeH_2 with various Trotter orders (they use mm instead of kk). Notice that for most bond distances, chemical accuracy is only achieved when a Trotter order of 8 is used, demonstrating the importance of an accurate time evolution operator.

Parallel Krylov Method

In the previous section you calculated that for even a modest sized molecule like LiHLiH, over 26,000 circuits need to run to compute all of the expectation values. This does not factor in sampling on a physical QPU which would significantly increase this. The good news is that the Krylov approach is highly parallelizable, and, in theory, all of these circuits could be run asynchronously on a sufficient number of physical QPUs. Accelerated quantum supercomputers will eventually include multiple QPUs in a single data center and allow such computations to occur alongside supercomputers.

CUDA-Q has a number of features so you can easily code parallel applications and simulate how they would run on multiple QPUs using the MQPU backend. In this section you will implement a parallel version of the Krylov approach and test it out on a larger molecule.

Exercise 6

Though there are multiple ways to parallelize the population of the HH and SS matrices, the limiting factor is the number of QPUs available (or GPUs in our simulated case.) Most users probably have a limited number of GPUs, so we will only consider parallelization across 2 GPUs.

First, consider the function populate_h\texttt{populate\_h}. Each matrix element requires computation of an expectation value for every Pauli word in the Hamiltonian. We will focus on parallelizing computation of the real and imaginary parts of each of these. Below is an updated version of the function called populate_h_async which now takes advantage of the observe_async\texttt{observe\_async} function to run each observe\texttt{observe} call on a different QPU. The observe_async\texttt{observe\_async} now has a final parameter called qpu_id\texttt{qpu\_id} which specifies the correct GPU to run that computation on.

The results are stored in a so-called futures list which can be converted to a float by running .get().expectation()\texttt{.get().expectation()} on each element. Run the code below and see how much faster the parallel version is. Notice that it is not twice as fast because there is overhead for the parallelization and computing HH is just part of the workflow. As a challenge, learners are encouraged to try other parallelization schemes that might take advantage of more than two QPUs.

num_gpus =2 #parallelize over 2 GPUs cudaq.set_target('nvidia', option='mqpu,fp64') #TODO def populate_h_async(dt, n_steps, ref_states, trotter_order=1): ham_matrix = np.zeros((n_steps * len(ref_states),n_steps * len(ref_states)), dtype=complex) dt_s = [i*dt for i in range(n_steps)] * len(ref_states) states = ref_states * (n_steps) for m in range(n_steps * len(ref_states)): dt_m = dt_s[m] state_m = states[m] for n in range(m, (n_steps * len(ref_states))): dt_n = dt_s[n] state_n = states[n] tot_e = np.zeros(2) #prepare a list to save the asynchronously computed expectation values async_x_results = [] async_y_results = [] # Loops over the terms in the Hamiltonian, computing expectation values for i in range(len(pauli_string)): pauli_list = pauli_str(pauli_string[i], qubits_num) #Save the expectation values as a list of futures async_x_results.append(cudaq.observe_async(qfd_kernel, x_0, dt_m, dt_n, coefficient, pauli_string, pauli_list, state_m, state_n, nelectrons, qubits_num, trotter_order, qpu_id = i % num_gpus)) async_y_results.append(cudaq.observe_async(qfd_kernel, y_0, dt_m, dt_n, coefficient, pauli_string, pauli_list, state_m, state_n, nelectrons, qubits_num, trotter_order, qpu_id = i % num_gpus)) #Get the expectation values from the computed futures temp_real = async_x_results[i].get().expectation() * coefficient[i].real temp_imag = async_y_results[i].get().expectation() * coefficient[i].real tot_e[0] += temp_real tot_e[1] += temp_imag # Sums real and imaginary totals to specify Hamiltonian entry ham_matrix[m, n] = tot_e[0] + tot_e[1] * 1j if n != m: ham_matrix[n, m] = np.conj(ham_matrix[m, n]) return ham_matrix timesteps = 3 ref_states = [[0,1,2,3]] dt = 0.1 k = 3 start = time.time() H_final = populate_h(dt, timesteps, ref_states, k) S_final = populate_s(dt, timesteps, ref_states,k) eigen_value, eigen_vect = eigen(H_final, S_final) end = time.time() print("Serial Execution results in E=", np.min(eigen_value), "with time:", end - start) start = time.time() H_final = populate_h_async(dt, timesteps, ref_states, k) S_final = populate_s(dt, timesteps, ref_states,k) eigen_value, eigen_vect = eigen(H_final, S_final) end = time.time() print("Parallel (H) Execution results in E=", np.min(eigen_value), "with time:", end - start)
Serial Execution results in E= -1.992358540086284 with time: 42.55427098274231 Parallel (H) Execution results in E= -1.992358540086284 with time: 40.83650755882263

Now, try to parallelize populate_s. You will use similar syntax, but will need to adjust the loops differently to store the futures and access them later. There is more than one way to do this so feel free to be creative. Test your function and confirm it is faster than using populate_s.

It is expected the benefit of parallelizing computation of SS will be even smaller than HH, but the exercise is nevertheless helpful.

The benefit of parallelization is most noticable with large problem sizes where computation of SS and HH are much more challenging.

def populate_s_async(dt, n_steps, ref_states, trotter_order=1): # TODO - Write a function that parallelizes population of the S matrix #Copy your existing function as a starting point start = time.time() H_final = populate_h_async(dt, timesteps, ref_states, k) S_final = populate_s_async(dt, timesteps, ref_states,k) eigen_value, eigen_vect = eigen(H_final, S_final) end = time.time() print("Parallel (H and S) Execution results in E=", np.min(eigen_value), "with time:", end - start)
Parallel (H and S) Execution results in E= -1.992358540086284 with time: 40.65394353866577

Conclusion and Extensions

The Krylov QSD approach is a powerful and versatile tool. This lab has introduced the underlying theory that allowed you to implement your own version and explore several ways to improve results and accelerate the algorithm with parallelization.

The hybrid nature of the Krylov approach is excellent for balancing the need for quantum computers to evaluate expectation values from large Hibert spaces, but do so in a way that a subspace matrix can be constructed and solved classically. A very promising aspect of Krylov methods is the fact that they can be combined with other quantum computing approaches. For example, it is perfectly valid to prepare the reference state and then run Krylov QSD to refine the prediction. This further creates a robust hybrid workflow that builds upon other quantum techniques.

For a challenge, if you have completed the ADAPT-VQE CUDA-Q Academic lab, see if you can combine these two methods into a single workflow.