GitHub Repository: NVIDIA/cuda-q-academic
Path: blob/main/chemistry-simulations/krylov_subspace_diagonalization.ipynb
¹¹³³ views

Kernel: Python 3 (ipykernel)

In [ ]:

# SPDX-License-Identifier: Apache-2.0 AND CC-BY-NC-4.0
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

Krylov Quantum Subspace Diagonalization

The Krylov quantum subspace diagonalization method is a powerful hybrid quantum chemistry method for estimating ground state energies, taking advantage of quantum computers to build a matrix and using a classical supercomputer to diagonalize that matrix. The method has advantages over variational approaches and is a promising technique for early fault tolerant quantum computing. This lab will introduce the Krylov method, guide you through implementing it in CUDA-Q, teach you how to improve results, and parallelize computations across multiple QPUs.

What you will do:

Learn the theory behind Krylov quantum subspace diagonalization
Implement a Krylov solver in CUDA-Q
Explore methods to improve your solver such as higher order Trotter approximations and better reference states
Parallelize your implementation to run on multiple simulated QPUs

CUDA-Q Syntax you will use:

methods for CUDA-Q kernel construction
observe and get_state to evaluate kernels
The mqpu backend, the associated observe_async and get_state_async functions, and get.
exp_pauli to dynamically construct quantum circuits from CUDA-Q Spin Operators

Prerequisites: This lab assumes you have a basic understanding of quantum algorithms and understand the basic concepts of quantum computing. If you need a refresher, check out the "Quick Start to Quantum Computing with CUDA-Q" series to learn how to code a VQE implementation with CUDA-Q from scratch. This lab focuses on a chemistry application, so some knowledge of chemistry terminology is necessary to fully appreciate the exercises.

Run the following code to prepare the packages you will use in this lesson.

💻 Just a heads-up: This notebook is designed to be run on an environment with a GPU. If you don't have access to a GPU, feel free to read through the cells and explore the content without executing them. Enjoy learning! ⭐

In [ ]:

import cudaq
import numpy as np
import scipy
import time
%pip install pyscf -q
from aux_files.krylov.qchem.classical_pyscf import get_mol_hamiltonian
from aux_files.krylov.qchem.hamiltonian import jordan_wigner_fermion
from aux_files.krylov.qchem.eigenvaluesolver import eigen


# Single-node, single gpu
cudaq.set_target("nvidia", option = 'fp64')

Quantum Subspace Diagonalization

Quantum computing methods for quantum chemistry lie on a spectrum from NISQ methods to methods suited for large-scale fault tolerant quantum computers. The problem that we'll be considering in this notebook is to find the ground state energy of a Hamiltonian. Mathematically, this is equivalent to finding the lowest eigenvalue of a given matrix.

The prototypical NISQ algorithm is the Variational Quantum Eigensolver (VQE) which computes ground state energies by classically optimizing a parameterized circuit and computing expectation values for the cost function. These circuits are shallow which is a favorable trait, but the number of measurements needed each round scales O( $n^4$ ) with the size of the Hamiltonian. Most problematic is the fact that circuits corresponding to even modest qubit counts have extremely flat optimization surfaces and tend to not converge which is the so called "barren plateau" problem. Even when convergence is possible, the number of iterations required may become intractably large.

On the other side of the spectrum, quantum phase estimation (QPE) is hallmark fault tolerant algorithm for quantum chemistry. Given a good initial state, QPE can compute ground state energies with exponential precision, but requires extremely deep circuits due to the necessity of the quantum Fourier transform in most implementations. QPE is extremely sensitive to errors, requiring very low error rates to function. Consequently, it relies on quantum error correction protocols (Learn more in our QEC 101 course), which are resource-intensive and demand thousands to millions of physical qubits.

Quantum Krylov methods are a bridge between the two and balance tradeoffs to potentially result in useful applications on devices that are more near term. The main idea is to reduce the size of the matrix to a smaller one and compute the eigenvalues of this smaller matrix. If chosen well, the lowest eigenvalues of the smaller matrix and the larger one agree. The Krylov approach is a subset of broader quantum subspace diagonalization (QSD) methods which try to diagonalize the Hamiltonian in a general non-orthogonal basis. In other words, given an initial state that is close to the ground state, the Krylov method will help identify a good subspace of the Hamiltonian to diagonalize such that the best possible estimates of the extreme eigenvalues (i.e., the ground state) are produced.

Such an approach requires circuits of moderate depth and uses them to populate matrix elements of a Hamiltonian subspace which is then classically diagonalized. Krylov methods still suffer from the O( $n^4$ ) Hamiltonian scaling as the full Hamiltonian must be measured to compute subspace matrix elements. However, increasing the subspace dimension exponentially leads to convergence towards the ground state energy, which could outperform the many VQE iterations required for larger problems.

The widget below will provide some intuition for subspace diagonalization and why Krylov methods work. Given the 8x8 matrix below, move the slider to select a subspace dimension and look at the computed eigenvalues. Notice the ground state energy converges to the exact ground state while the other eigenvalues remain rather inaccurate. Subspace methods are quite good at estimating the extreme eigenvalues (highest or lowest) which is favorable as the lowest eigenvalue (the ground state energy) is usually the quantity of interest.

Click this link to access the widget

To get a sense for why a good initial state is important (more on that later), choose which corner the subspace originates from in the widget. Notice, the approximate eigenvalues are much worse when the subspace is constructed from certain corners which happen to capture less information about the eigen spectrum of the matrix.

Similarly, in the quantum Krylov methods, a good approximation of the ground state will help construct a better subspace for estimating the ground state energy.

The Quantum Krylov Method

This section will walk through an implementation of a quantum Krylov method based on A Multireference Quantum Krylov Algorithm for Strongly Correlated Electrons. The general workflow follows the figure below inspired by Quantum Krylov subspace algorithms for ground and excited state energy estimation. The first step is to build a subspace for which a generalized eigenvalue problem will be solved. Then, the matrices for the eigenvalue problem are populated with expectation values from quantum circuit evaluations using the Hadamard test, and finally, the eigenvalues of the subspace are determined classically. The subspace can then be gradually increased in size and the process repeated until the result is sufficiently accurate.

The first step is to select a set of reference states from which the subspace is constructed. A benefit for this approach is that it is multireference. A basis of $d$ states ${\Phi_0 \cdots \Phi_d}$ is defined where each state is a linear combination of Slater determinants:

\ket{\Phi_I} = \sum_{\mu} d_{\mu I}\ket{\phi_{\mu}}.

From this, a non-orthogonal Krylov Space $\mathcal{K} = \{\psi_{0} \cdots \psi_{N}\}$ is constructed by applying a family of $s$ unitary operators on each of the $d$ reference states resulting in $d*s = N$ elements in the Krylov space where $\ket{\psi_{\alpha}} \equiv \ket{\psi_I^{(n)}} = \hat{U}_n\ket{\Phi_I},$

This constructs the state $\ket{\Psi}$ which is an approximation of the full configuration interaction (FCI) wavefunction.

\ket{\Psi} = \sum_{\alpha} c_{\alpha}\ket{\psi_{\alpha}} = \sum_{I=0}^d \sum_{n=0}^s c_I^{(n)}\hat{U}_n\ket{\Phi_I}.

The selection of the unitary operations is not arbitrary and instead corresponds to consecutive applications of the time evolution operator $U_n = e^{-iHt_n}$ . This ensures that a basis is chosen which well describes the eigenvalues of $H$ .

The energy of this state can be obtained by solving the generalized eigenvalue problem using the state defined above to compute the matrix elements of $H$ and $S$ .

\boldsymbol{Hc}=\boldsymbol{Sc}E,

These elements are obtained using the Hadamard test, which uses phase kickback to estimate an expectation value of the time evolution operators. For a more detailed discussion on why this works, please see this CUDA-Q tutorial. If $H = \sum_l c_lP_l$ , the elements of the overlap are $S_{\alpha \beta} = \braket{\psi_{\alpha}|\psi_{\beta}} = \braket{\Phi_I\hat{U}_m^{\dagger}\hat{U}_n|\Phi_J}$

and the elements for the Hamiltonian matrix are

H_{\alpha \beta} = \braket{\psi_{\alpha}|\hat{H}|\psi_{\beta}} = \braket{\Phi_I|\hat{U}_m^{\dagger}\hat{H}\hat{U}_n|\Phi_J} = \sum_l c_l \braket{\Phi_I|\hat{U}_m^{\dagger}\hat{P_l}\hat{U}_n|\Phi_J}.

The matrix elements for $S$ and $H$ are computed with the Hadamard test using the circuit shown below. In the case of the overlap matrix $S$ , the Pauli word is the identity, so the $P_l$ drops out.

The $2\sigma_+$ term refers to measurement of the expectation value of this circuit with the $X+iY$ operator.

Once the $H$ and $S$ matrices are constructed, the diagonalization is performed classically to produce an estimate for the ground state in question.

Exercise 1

Before implementing the quantum Krylov method, find the ground state energy of the linear $H_4$ molecule and compute the FCI energy by direct matrix diagonalization. This will provide a reference to compare to your Krylov energies. Note that the $\texttt{spin\_ham\_matrix = hamiltonian.to\_matrix$$}$ line saves the Hamiltonian as a NumPy array. How large is the full Hamiltonian matrix?

In [2]:

#geometry = 'Li 0.3925 0.0 0.0; H -1.1774 0.0 0.0'
#geometry = 'H 0.0 0.0 0.0; H 0.0 0.0 0.7474'
geometry = 'H 0.0 0.0 1.5; H 0.0 0.0 3,0; H 0.0 0.0 4.5; H 0.0 0.0 6.0'
molecular_data = get_mol_hamiltonian(xyz=geometry, spin=0, charge=0, basis='sto3g', ccsd=True, verbose=True)

obi = molecular_data[0]
tbi = molecular_data[1]
e_nn = molecular_data[2]
nelectrons = molecular_data[3]
norbitals = molecular_data[4]

qubits_num = 2 * norbitals

hamiltonian = jordan_wigner_fermion(obi, tbi, e_nn, tolerance = 1e-12)

spin_ham_matrix = hamiltonian.to_matrix()

print("H is a", len(spin_ham_matrix), "by", len(spin_ham_matrix) , "matrix")

#TODO - Compute FCI energy with matrix diagonalization
e, c = #FIXME

# Find the ground state energy and the corresponding eigenvector
print('Ground state energy (classical simulation)= ', np.min(e), ', index= ', np.argmin(e))
min_indices = np.argsort(e)[:5]

# Eigenvector can be used to initialize the qubits
vec = c[:, min_indices[0]]

Out[2]:

output file: H 0-pyscf.log
[pyscf] Total number of orbitals =  4
[pyscf] Total number of electrons =  4
[pyscf] HF energy =  -1.829137412352686
[pyscf] Total R-CCSD energy =  -1.9976240454057859
H is a 256 by 256 matrix
Ground state energy (classical simulation)=  (-1.996150325518812+0j) , index=  13

Implementing a Quantum Krylov Method

Exercise 2

Now you are ready to code up an implementation of the full Krylov method. The following cells will guide you through this with certain "TODO" locations where you can fill in the code. Hint: use the $\texttt{cudaq.control(U, a, ...U-args)}$ to apply a kernel $\texttt{U}$ controlled by a qubit $\texttt{a}$ .

Building the main kernel

The heart of the Krylov algorithm is a CUDA-Q kernel (qfd_kernel) that can compute the matrix elements in the circuit shown above, There are a few sub kernels that must be constructed first and these will be explained below. Note that we will prepare a code where the basis vector are single Slater determinants for simplicity. In principle, it could be modified to take any initial state.

kernel U_m first builds the occupation vector of the target slater determinant $\ket{\Phi_I}$ by applying $X$ gates. Then, the command exp_pauli applies each Pauli word in the Hamiltonian as a matrix exponentiation to perform time evolution.
apply_pauli is a helper kernel provided for you that translates a list of integers to the respective $X$ , $Y$ , and $Z$ gates for application to measure a specific Pauli word of the Hamiltonian.
kernel U_m first builds the occupation vector of the target slater determinant $\ket{\Phi_J}$ by applying $X$ gates. Then, the command exp_pauli applies each Pauli word in the Hamiltonian as a matrix exponentiation to perform time evolution followed by controlled application of the $P_i$ using apply_pauli.
Kernel qfd_kernel defines a qubit register and an ancilla and uses these three kernels to implement the above circuit. Note, this is general as the code below building the $S$ matrix will input $P_l = II\cdots II$

In [3]:

# TODO - write kernel U_m
@cudaq.kernel
def U_m(qubits: cudaq.qview, dt: float, coefficients: list[complex],
        words: list[cudaq.pauli_word], initial_state: list[int], trotter_order: int):
    """
    Function to prepare |Φ_I⟩ and time evolve
    
    Parameters
    ----------
    qubits : cudaq.qview
        Quantum register on which the kernel acts (data qubits only;
        ancillary qubits are handled outside this routine).

    dt : float
        Time step for the evolution operator.

    coefficients : list[complex]
        Scalar coefficients of the Pauli words in the Hamiltonian

    words : list[cudaq.pauli_word]
        The corresponding Pauli words of H in  cuda-quantum’s native representation.
        
    initial_state : list[int]
        Indices of qubits that are occupied in the reference determinant|Φ_I⟩

    trotter_order : int
        Order of the Trotter–Suzuki decomposition 
    """
    
   
    # TODO - Write the kernel


@cudaq.kernel
def apply_pauli(qubits: cudaq.qview, word: list[int]):
    """
    Takes a single Pauli word and applies the individual Pauli operators. 
    
    Parameters
    ----------
    qubits : cudaq.qview
        Quantum register on which the kernel acts (data qubits only;
        ancillary qubits are handled outside this routine).

    word : list[int]
        list of integers  of 1,2 or 3, corresponding to application of X, Y, Z gates
    """
    
    # Add H (Hamiltonian operator)
    for i in range(len(word)):
        if word[i] == 1:
            x(qubits[i])
        if word[i] == 2:
            y(qubits[i])
        if word[i] == 3:
            z(qubits[i])



# Applies Unitary operation corresponding to HU_n
@cudaq.kernel
def U_n(qubits: cudaq.qview, dt: float, coefficients: list[complex],
        words: list[cudaq.pauli_word], initial_state: list[int], word_list: list[int], trotter_order: int):
    """
    Function to prepare |Φ_J> and time evolve and apply P_l
    
    Parameters
    ----------
    qubits : cudaq.qview
        Quantum register on which the kernel acts (data qubits only;
        ancillary qubits are handled outside this routine).

    dt : float
        Real time step for the evolution operator.

    coefficients : list[complex]
        Scalar coefficients of the Pauli words in the Hamiltonian

    words : list[cudaq.pauli_word]
        The corresponding Pauli words of H in  cuda-quantum’s native representation.
        
    initial_state : list[int]
        Indices of qubits that are occupied in the reference determinant|Φ_I⟩

    word_list : list[int]
        list of integers  of 1,2 or 3, corresponding to application of X, Y, Z gates

    trotter_order : int
        Order of the Trotter–Suzuki decomposition 
    """
    
    # TODO - write the kernel


@cudaq.kernel
def qfd_kernel(dt_alpha: float, dt_beta: float, coefficients: list[complex],
               words: list[cudaq.pauli_word], word_list: list[int],
              state_alpha: list[int], state_beta: list[int], n_electrons: int, n_qubits: int, trotter_order:int):

    """
    Kernel to build circuit which is used to compute matrix elements of H and S.
    
    Parameters
    ----------
    dt_alpha : float
        Time evolutoion step for U_m

    dt_beta : float
        Time evolutoion step for U_m

    coefficients : list[complex]
        Scalar coefficients of the Pauli words in the Hamiltonian

    words : list[cudaq.pauli_word]
        The corresponding Pauli words of H in  cuda-quantum’s native representation.
        
    word_list : list[int]
        list of integers  of 1,2 or 3, corresponding to application of X, Y, Z gates

    state_alpha: list[int]
        Indices of qubits that are occupied in the reference determinant|Φ_I⟩

    state_beta: list[int]
        Indices of qubits that are occupied in the reference determinant|Φ_J⟩

    n_electrons: int
        number of electrons in the molecular system

    n_qubits: int
        Number of qubits = 2*n_orbitals

    trotter_order : int
        Order of the Trotter–Suzuki decomposition 
    """

    #TODO - write the kernel

CUDA-Q kernels require specific data types. To apply the Hamiltonian operator within the kernels, it must be broken down into lists of coefficients and their respective Pauli words. Also, types for conditional logic within kernels are limited to integers. So, a function is also needed to convert each Pauli word into a list of integers corresponding to $I$ , $X$ , $Y$ , and $Z$ . The functions below accomplish this and are provided for you.

Note that coefficient = term_coefficients(hamiltonian) and pauli_string = term_words(hamiltonian) must be initialized before building the matrices for each molecular system you test.

In [4]:

# Collect coefficients from a spin operator
def term_coefficients(ham: cudaq.SpinOperator) -> list[complex]:
    """
    Function to extract Hamiltonian coefficients    
    Parameters
    ----------
    ham : cudaq.SpinOperator
        Hamiltonian as a CUDA-Q spin operator

    Returns
    --------
    list(complex)
        list of Hamiltonian term coefficients
    """
    
    #TODO- write the function

# Collect Pauli words from a spin operator
def term_words(ham: cudaq.SpinOperator) -> list[str]:
    """
    Finction to collect Pauli words from the Hamiltonian
    
    Parameters
    ----------
    ham : cudaq.SpinOperator
        Hamiltonian as a CUDA-Q spin operator

    Returns
    --------
    list(str)
        list of Hamiltonian terms
    """
    #TODO - write the function

def pauli_str(pauli_word, qubits_num):
    """
    Finction to take a Pauli word and convert it to integer representation
    
    Parameters
    ----------
    pauli_word : cudaq.SpinOperator
        a single Pauli word

    qubit_num: int
        number of qubits in the circuit

    Returns
    --------
    list(int)
        list of integers corresponding to Pauli operations
    """
    
    my_list = []
    for i in range(qubits_num):
        if str(pauli_word[i]) == 'I':
            my_list.append(0)
        if str(pauli_word[i]) == 'X':
            my_list.append(1)
        if str(pauli_word[i]) == 'Y':
            my_list.append(2)
        if str(pauli_word[i]) == 'Z':
            my_list.append(3)
    return my_list    


# Build the lists of coefficients and Pauli Words from the H4 Hamiltonian
coefficient = term_coefficients(hamiltonian)
pauli_string = term_words(hamiltonian)


print(coefficient[0:10])
print(pauli_string[0:10])
print(pauli_str(pauli_string[0], qubits_num))

Out[4]:

[(-0.9209431016975833+0j), (0.11933984698934136+0j), (0.11933984698934136+0j), (0.07128066964853524+0j), (0.07128066964853522+0j), (-0.006895599406554084+0j), (-0.006895599406554112+0j), (-0.10062378739478249+0j), (-0.10062378739478246+0j), (0.10125845891307164+0j)]
['IIIIIIII', 'ZIIIIIII', 'IZIIIIII', 'IIZIIIII', 'IIIZIIII', 'IIIIZIII', 'IIIIIZII', 'IIIIIIZI', 'IIIIIIIZ', 'ZZIIIIII']
[0, 0, 0, 0, 0, 0, 0, 0]

Computing the matrix elements

You are now ready to start computing the overlap $S$ and Hamiltonian matrix $H$ for the subspace. You will write a function to compute each matrix given a time step, number of evolution steps ( $s$ ), and a list of reference states. It is assumed that each reference state will evolve the same number of time steps.

First define the two observables necessary to compute the real and imaginary parts using the Hadamard test. That is, an $X$ and $Y$ operator operating on the ancilla qubit for the real and imaginary parts, respectively. Note, these must be computed as two separate expectation values, not a single $XY$ !

Next, generate am empty overlap matrix (consisting of $s*d \times s*d$ elements) and loop over the elements to compute the matrix elements using cudaq.observe. The matrix is symmetric, so for performance reasons, only compute the upper diagonal and copy the results to the corresponding position. For the pauli_string argument in qfd_kernel, use pauli_str to enter a list corresponding to the identity operator, as there is no observable to compute in the overlap matrix. (This is done for you.)

In [5]:

#TODO - Define the two observables which compute the real and imaginary part of the matrix element
x_0 = #FIXME
y_0 = #FIXME

def populate_s(dt, n_steps, ref_states, trotter_order=1):
    """
    Function to compute the overlap matrix S with the Hadamard test
    
    Parameters
    ----------
    dt : float
        Time evolution time step

    n_steps: int
        Number of time steps each reference vector is evolved

    ref_states: list[list[int]]
        Lits of lists of indices to prepare occupation vectors for reference slater determinants

    trotter_order : int
        Order of the Trotter–Suzuki decomposition 

    Returns
    --------
    S: np.array(complex)
        Returns the overlap matrix
    """

    identity_op = cudaq.SpinOperator.from_word('I' * qubits_num)
    identity_word = identity_op.get_pauli_word()
    pauli_list = pauli_str(identity_word, qubits_num)

    dt_s = [i*dt for i in range(n_steps)] * len(ref_states)
    states = ref_states * (n_steps)
    
    # Empty overlap matrix S
    wf_overlap = np.zeros((n_steps * len(ref_states), n_steps * len(ref_states)), dtype=complex)
 
    #TODO - write the loop to solve for S matrix elements
 
    return wf_overlap

Finally, write a similar loop for the Hamiltonian matrix. In this case, each matrix element needs to be computed by looping over all Pauli words ( $P_l$ ), computing an expectation value for each word, multiplying this expectation value by that word's coefficient ( $c_l$ ), and summing the results. In this case, pauli_str will need to be called for each word looped over to produce a list that can be applied as Pauli operators within the kernel.

In [6]:

def populate_h(dt,  n_steps, ref_states, trotter_order=1):

    """
    Function to compute the Hamiltonian matrix S with the Hadamard test
    
    Parameters
    ----------
    dt : float
        Time evolution time step

    n_steps: int
        Number of time steps each reference vector is evolved

    ref_states: list[list[int]]
        Lits of lists of indices to prepare occupation vectors for reference slater determinants

    trotter_order : int
        Order of the Trotter–Suzuki decomposition 

    Returns
    --------
    H: np.array(complex)
        Returns the Hamiltonian matrix
    """

    ham_matrix = np.zeros((n_steps * len(ref_states),n_steps * len(ref_states)), dtype=complex)


    dt_s = [i*dt for i in range(n_steps)] * len(ref_states)
    states = ref_states * (n_steps)

    #TODO write a loop to compute H

    return ham_matrix

Solving the eigenvalue problem

Let's recap. You started with classical computation of the Hamiltonian and an initial ground state. You then cleverly defined a subspace to approximate the total Hamiltonian using a basis of vectors formed by evolving the initial state with the Hamiltonian. You then populated the $H$ and $S$ matrices corresponding to the subspace using quantum computations. Now, you will classically solve this eigenvalue problem to compute the ground state energy of the linear $H_4$ molecule.

The generalized eigenvalue problem is solved by first diagonalizing $S$ .

S = U S' U^{\dagger}

The eigenvectors $v$ and eigenvalues $e$ are then used to construct a new matrix $X'$ :

X' = S^{-1/2} = \sum_k v_{ki} \frac{1}{\sqrt{e_k}}v_{kj}

This matrix then diagonalizes $H$ as

X'^{\dagger} H X' = ES^{1/2}C

Using the eigenvectors of $H$ which are $S^{1/2}C$ , one can find the eigenvectors of the original problem by left multiplying by $S^{-1/2}$ .

The function eigen, imported from an auxiliary script solves the generalized eigenvalue problem for you. The function checks the spread of the eigenvalues of $S$ known as the condition number. If this number is large, numerical instabilities may corrupt the solution. This is fixed by projecting out tiny eigenvalues. You can set verbose=True to print information about the conditioning of the matrix if you would like.

Now, compute the ground state energy of $H_4$ using The Hartree Fock state as the only reference, evolving 4 and then 8 time steps. (The first example is provided for running this workflow.) Then recompute using two reference states, HF and a determinant formed by promoting this highest energy electron to the next available orbital and evolve each for 2 and then 4 time steps.

How do your results look? Is a larger $N=s*d$ better? Are better results obtained from increasing $d$ or $s$ ? Are they accurate to within chemical accuracy (mHartrees)? Note, you must use fp64 precision with the nvidia simulator otherwise your result will suffer from numerical inaccuracies.

In [7]:

timesteps = 4
ref_states = [[1,2,3,4]]
dt= 1.0


H_final = populate_h(dt, timesteps, ref_states)
S_final = populate_s(dt, timesteps, ref_states)

eigen_value, eigen_vect = eigen(H_final, S_final)
print('Energy from QFD s = 4, d = 1:')
print(np.min(eigen_value))



#TODO - Use the Hartree Fock state evolved 8 time steps
timesteps = #FIXME
ref_states = #FIXME
dt= 1.0


H_final = populate_h(dt, timesteps, ref_states)
S_final = populate_s(dt, timesteps, ref_states)

eigen_value, eigen_vect = eigen(H_final, S_final)
print('Energy from QFD s = 8, d = 1:')
print(np.min(eigen_value))

#TODO - Use the Hartree Fock state and an excited state Slater Determinant, both evolved 2 steps.
timesteps = #FIXME
ref_states = #FIXME
dt= 0.001
H_final = populate_h(dt, timesteps, ref_states)
S_final = populate_s(dt, timesteps, ref_states)


eigen_value, eigen_vect = eigen(H_final, S_final)
print('Energy from QFD s = 2, d = 2:')
print(np.min(eigen_value))

#TODO - Use the Hartree Fock state and an excited state Slater Determinant, both evolved 4 steps.
timesteps = #FIXME
ref_states = #FIXME
dt= 0.001
H_final = populate_h(dt, timesteps, ref_states)
S_final = populate_s(dt, timesteps, ref_states)


eigen_value, eigen_vect = eigen(H_final, S_final)
print('Energy from QFD s = 4, d = 2:')
print(np.min(eigen_value))


print('Exact Ground State Energy):')
print(np.min(e))

Out[7]:

Energy from QFD s = 4, d = 1:
-1.7924051988640382
Energy from QFD s = 8, d = 1:
-1.8193961793720168
Energy from QFD s = 2, d = 2:
-1.829137412352687
Energy from QFD s = 4, d = 2:
-1.9660837421691038
Exact Ground State Energy):
(-1.996150325518812+0j)

Exercise 3

You have successfully implemented a Krylov method and computed the ground state energy of $H_4$ using a subspace of the full Hamiltonian. To understand the resource savings here, write a function that computes the number of circuits that must be executed to compute the ground state energy of STO-3G LiH which has 631 Hamiltonian terms and a Hilbert space of size $2^{12}$ . (Assume the expectation values are computed from a single circuit and are not shot based.)

Would a subspace of $d=2, s=3$ require execution of more or less circuits than a VQE procedure with 1000 iterations? Also, how much smaller is the subspace than the full Hamiltonian if one were to compute the FCI energy of LiH.

In [8]:

def krylov_circuits(m, ham_terms):
    
   #TODO - write code to answer the questions

Out[8]:

Circuits for Krylov Method:
26544.0
circuits for 1000 VQE steps
631000
Subspace is 64.0 times smaller than full Hilbert space

Notice from this, that the $N=6$ subspace requires far fewer circuits. This is somewhat of a tough comparison as it is highly variable how many VQE iterations are needed for each case, or if convergence is even possible. Regardless, Krylov approaches have a much more efficient route to converge on the total energy, yet remain limited by the size of the Hamiltonian which determines how costly each matrix element is to compute.

Selecting an Improved Krylov Basis

As you saw from your initial implementation, the quality of the Krylov QSD result can depend heavily on the reference states used and how many time evolution steps are selected. Examine the table below from the paper. Notice how larger subspaces improve (lower) the ground state energy prediction for the linear $H_6$ molecule (STO-6G).

The table tells a second story. The columns with $k()$ denote the condition number of the overlap matrix $S$ , that is, the ratio of the largest to the smallest eigenvalue. If this number is very large (over $10^{12}$ ) the solution is potentially numerically unstable and could produce inaccurate and even non-variational predictions. This tends to occur when vectors in the Krylov subspace are too similar.

The table has two columns, one where a single reference (SR) Hartree Fock state is evolved for $N$ time steps, and another where multiple reference vectors $d = N/4$ are evolved for four time steps each. Notice how the MR condition numbers are small and remain $O(10^6)$ while the SR cases are huge. Though potentially OK for a small system, this is highly problematic for larger molecules and indicates the value of using multiple reference vectors over a single reference evolved for more time steps.

Subspace Size	E(SR)	$k(\text{SR})$	E(MR)	$k(\text{MR})$
4	−3.015510	$3.29\times10^{5}$	−3.015510	$3.29\times10^{5}$
8	−3.019768	$3.60\times10^{11}$	−3.019301	$4.86\times10^{5}$
12	−3.020172	$1.61\times10^{17}$	−3.019696	$9.39\times10^{5}$
16	−3.020192	$3.19\times10^{17}$	−3.019835	$5.68\times10^{6}$
20	−3.020198	$3.86\times10^{17}$	−3.019929	$6.23\times10^{6}$

A natural question is what other ways can improve the construction of the Krylov subspace? The rest of this section will explore two more ways.

Exercise 4

One way to improve the subspace selection of an appropriate time evolution step. Use your code to test the following time steps $\{10, 1, ... 0.00001\}$ with the 6-31G Hartree Fock state of $H_2$ (so it runs faster) as a reference time-evolved for three timesteps. Comment on the ground state energy predictions. Is there a systematic trend? What might be an explanation for what makes a good vs a poor $dt$ selection?

In [9]:

geometry = 'H 0.0 0.0 0.0; H 0.0 0.0 0.7474'
molecular_data = get_mol_hamiltonian(xyz=geometry, spin=0, charge=0, basis='631g', ccsd=True, verbose=False)

obi = molecular_data[0]
tbi = molecular_data[1]
e_nn = molecular_data[2]
nelectrons = molecular_data[3]
norbitals = molecular_data[4]

qubits_num = 2 * norbitals

hamiltonian = jordan_wigner_fermion(obi, tbi, e_nn, tolerance = 1e-12)

spin_ham_matrix = hamiltonian.to_matrix()

e, c = np.linalg.eig(spin_ham_matrix)

# Build the lists of coefficients and Pauli Words from the H2 Hamiltonian
coefficient = term_coefficients(hamiltonian)
pauli_string = term_words(hamiltonian)


timesteps = 3
ref_states = [[0,1]]

for dt in [10, 1.0, 0.1, 0.01, 0.001, 0.0001, 0.00001]:
    
    H_final = populate_h(dt, timesteps, ref_states)
    S_final = populate_s(dt, timesteps, ref_states)

    eigen_value, eigen_vect = eigen(H_final, S_final)
    
    print("The time step dt=", dt, "results in E=", np.min(eigen_value))


print('Exact Ground State Energy):')
print(np.min(e))

Out[9]:

overwrite output file: H 0-pyscf.log
The time step dt= 10 results in E= -1.137175710240685
The time step dt= 1.0 results in E= -1.137175710240685
The time step dt= 0.1 results in E= -1.137175710240694
The time step dt= 0.01 results in E= -1.1371757102409101
The time step dt= 0.001 results in E= -1.1371757101747133
The time step dt= 0.0001 results in E= -1.1371757047893722
The time step dt= 1e-05 results in E= -1.1163255644809635
Exact Ground State Energy):
(-1.1371757102406845+0j)

Exercise 5

Another way to improve the basis selection has to do with how the time evolution operator is approximated. Consider the Hamiltonian $H = \sum c_lP_l$ where $c_l$ are the coefficients and $P_l$ are the Pauli words. We have used the $\texttt{exp\_pauli}$ function to apply each Pauli word individually and approximate $e^{-iHt}$ . This is because noncommuniting terms need to be applied using the Suzuki-Trotter approximation shown below.

$e^{-i(c_0P_0 + c_1P_1 + \cdots c_lP_l)t} \approx (e^{-ic_0P_0t/k}e^{-ic_1P_1t/k}\cdots e^{-ic_lP_lt/k})^k,$

where the approximation converges to the exact when $k$ goes to infinity. So far, we have used a first order Trotter-Suzuki Operator ( $k=1$ ), but this can be increased to provide an even more accurate time evolution operator. This is implemented by applying the time evolution operators within $U_n$ and $U_m$ $k$ times where the coefficients are divided by a factor of $k$ . Update your code so that $k$ can be specified and run the procedure for $H_4$ using the HF reference and three total time steps with $k = \{1,2,3,4,5\}$ . Also, time each computation and print the runtime. Do the higher order approximations produce better results? How much slower is each increase in approximation order?

In [10]:

import time
geometry = 'H 0.0 0.0 1.5; H 0.0 0.0 3,0; H 0.0 0.0 4.5; H 0.0 0.0 6.0'
molecular_data = get_mol_hamiltonian(xyz=geometry, spin=0, charge=0, basis='sto3g', ccsd=True, verbose=False)

obi = molecular_data[0]
tbi = molecular_data[1]
e_nn = molecular_data[2]
nelectrons = molecular_data[3]
norbitals = molecular_data[4]

qubits_num = 2 * norbitals

hamiltonian = jordan_wigner_fermion(obi, tbi, e_nn, tolerance = 1e-12)

spin_ham_matrix = hamiltonian.to_matrix()

e, c = np.linalg.eig(spin_ham_matrix)

coefficient = term_coefficients(hamiltonian)
pauli_string = term_words(hamiltonian)


timesteps = 3
ref_states = [[0,1,2,3]]
dt = 0.1
for k in [1,2,3,4,5]:
    start = time.time()
    H_final = populate_h(dt, timesteps, ref_states, k)
    S_final = populate_s(dt, timesteps, ref_states,k)

    eigen_value, eigen_vect = eigen(H_final, S_final)
    end = time.time()
    print("The Trotter order k=", k, "results in E=", np.min(eigen_value), "with time:", end - start)


print('Exact Ground State Energy):')
print(np.min(e))

Out[10]:

overwrite output file: H 0-pyscf.log
The Trotter order k= 1 results in E= -1.9860932225754904 with time: 16.321036338806152
The Trotter order k= 2 results in E= -1.9913724123384422 with time: 29.278184175491333
The Trotter order k= 3 results in E= -1.992358540086284 with time: 42.292258739471436
The Trotter order k= 4 results in E= -1.9926827227889903 with time: 55.2967255115509
The Trotter order k= 5 results in E= -1.9928241489048735 with time: 68.2799425125122
Exact Ground State Energy):
(-1.9961503255188189+0j)

Clearly there is a benefit to using a higher Trotter order, but there is also a cost. Each additional increase in $k$ doubles the circuit depth of the time evolution operator resulting in many more gates to apply and longer circuit simulation times. In A Multireference Quantum Krylov Algorithm for Strongly Correlated Electrons, the authors present this table of data corresponding to $BeH_2$ with various Trotter orders (they use $m$ instead of $k$ ). Notice that for most bond distances, chemical accuracy is only achieved when a Trotter order of 8 is used, demonstrating the importance of an accurate time evolution operator.

Parallel Krylov Method

In the previous section you calculated that for even a modest sized molecule like $LiH$ , over 26,000 circuits need to run to compute all of the expectation values. This does not factor in sampling on a physical QPU which would significantly increase this. The good news is that the Krylov approach is highly parallelizable, and, in theory, all of these circuits could be run asynchronously on a sufficient number of physical QPUs. Accelerated quantum supercomputers will eventually include multiple QPUs in a single data center and allow such computations to occur alongside supercomputers.

CUDA-Q has a number of features so you can easily code parallel applications and simulate how they would run on multiple QPUs using the MQPU backend. In this section you will implement a parallel version of the Krylov approach and test it out on a larger molecule.

Exercise 6

Though there are multiple ways to parallelize the population of the $H$ and $S$ matrices, the limiting factor is the number of QPUs available (or GPUs in our simulated case.) Most users probably have a limited number of GPUs, so we will only consider parallelization across 2 GPUs.

First, consider the function $\texttt{populate\_h}$ . Each matrix element requires computation of an expectation value for every Pauli word in the Hamiltonian. We will focus on parallelizing computation of the real and imaginary parts of each of these. Below is an updated version of the function called populate_h_async which now takes advantage of the $\texttt{observe\_async}$ function to run each $\texttt{observe}$ call on a different QPU. The $\texttt{observe\_async}$ now has a final parameter called $\texttt{qpu\_id}$ which specifies the correct GPU to run that computation on.

The results are stored in a so-called futures list which can be converted to a float by running $\texttt{.get().expectation()}$ on each element. Run the code below and see how much faster the parallel version is. Notice that it is not twice as fast because there is overhead for the parallelization and computing $H$ is just part of the workflow. As a challenge, learners are encouraged to try other parallelization schemes that might take advantage of more than two QPUs.

In [15]:


num_gpus =2 #parallelize over 2 GPUs
cudaq.set_target('nvidia', option='mqpu,fp64')

#TODO 
def populate_h_async(dt,  n_steps, ref_states, trotter_order=1):
    ham_matrix = np.zeros((n_steps * len(ref_states),n_steps * len(ref_states)), dtype=complex)


    dt_s = [i*dt for i in range(n_steps)] * len(ref_states)
    states = ref_states * (n_steps)
    
    for m in range(n_steps * len(ref_states)):
        dt_m = dt_s[m]
        state_m = states[m]
        
        for n in range(m, (n_steps * len(ref_states))):

            dt_n = dt_s[n]
            state_n = states[n]
           
            tot_e = np.zeros(2)

            #prepare a list to save the asynchronously computed expectation values
            async_x_results = []
            async_y_results = []
    
            # Loops over the terms in the Hamiltonian, computing expectation values
            for i in range(len(pauli_string)):
                pauli_list = pauli_str(pauli_string[i], qubits_num)
       

               #Save the expectation values as a list of futures
                async_x_results.append(cudaq.observe_async(qfd_kernel, x_0, dt_m, dt_n,
                                            coefficient, pauli_string, pauli_list,
                                            state_m, state_n, nelectrons, qubits_num, trotter_order, qpu_id = i % num_gpus))

                async_y_results.append(cudaq.observe_async(qfd_kernel,  y_0, dt_m, dt_n,
                                            coefficient, pauli_string, pauli_list,
                                            state_m, state_n, nelectrons, qubits_num, trotter_order, qpu_id = i % num_gpus))

    
                #Get the expectation values from the computed futures
                temp_real = async_x_results[i].get().expectation() * coefficient[i].real
                temp_imag = async_y_results[i].get().expectation() * coefficient[i].real

                tot_e[0] +=  temp_real
                tot_e[1] += temp_imag
    
            # Sums real and imaginary totals to specify Hamiltonian entry
            ham_matrix[m, n] = tot_e[0] + tot_e[1] * 1j
            if n != m:
                ham_matrix[n, m] = np.conj(ham_matrix[m, n])
    return ham_matrix



timesteps = 3
ref_states = [[0,1,2,3]]
dt = 0.1
k = 3

start = time.time()
H_final = populate_h(dt, timesteps, ref_states, k)
S_final = populate_s(dt, timesteps, ref_states,k)
eigen_value, eigen_vect = eigen(H_final, S_final)
end = time.time()

print("Serial Execution results in E=", np.min(eigen_value), "with time:", end - start)


start = time.time()
H_final = populate_h_async(dt, timesteps, ref_states, k)
S_final = populate_s(dt, timesteps, ref_states,k)
eigen_value, eigen_vect = eigen(H_final, S_final)
end = time.time()

print("Parallel (H) Execution results in E=", np.min(eigen_value), "with time:", end - start)

Out[15]:

Serial Execution results in E= -1.992358540086284 with time: 42.55427098274231
Parallel (H) Execution results in E= -1.992358540086284 with time: 40.83650755882263

Now, try to parallelize populate_s. You will use similar syntax, but will need to adjust the loops differently to store the futures and access them later. There is more than one way to do this so feel free to be creative. Test your function and confirm it is faster than using populate_s.

It is expected the benefit of parallelizing computation of $S$ will be even smaller than $H$ , but the exercise is nevertheless helpful.

The benefit of parallelization is most noticable with large problem sizes where computation of $S$ and $H$ are much more challenging.

In [16]:

def populate_s_async(dt, n_steps, ref_states, trotter_order=1):

    # TODO - Write a function that parallelizes population of the S matrix 
    #Copy your existing function as a starting point


start = time.time()
H_final = populate_h_async(dt, timesteps, ref_states, k)
S_final = populate_s_async(dt, timesteps, ref_states,k)
eigen_value, eigen_vect = eigen(H_final, S_final)
end = time.time()

print("Parallel (H and S) Execution results in E=", np.min(eigen_value), "with time:", end - start)

Out[16]:

Parallel (H and S) Execution results in E= -1.992358540086284 with time: 40.65394353866577

Conclusion and Extensions

The Krylov QSD approach is a powerful and versatile tool. This lab has introduced the underlying theory that allowed you to implement your own version and explore several ways to improve results and accelerate the algorithm with parallelization.

The hybrid nature of the Krylov approach is excellent for balancing the need for quantum computers to evaluate expectation values from large Hibert spaces, but do so in a way that a subspace matrix can be constructed and solved classically. A very promising aspect of Krylov methods is the fact that they can be combined with other quantum computing approaches. For example, it is perfectly valid to prepare the reference state and then run Krylov QSD to refine the prediction. This further creates a robust hybrid workflow that builds upon other quantum techniques.

For a challenge, if you have completed the ADAPT-VQE CUDA-Q Academic lab, see if you can combine these two methods into a single workflow.