Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
quantum-kittens
GitHub Repository: quantum-kittens/platypus
Path: blob/main/notebooks/summer-school/2021/resources/lab-notebooks/lab-4.ipynb
3855 views
Kernel: Python 3

image-3.png

Training Parameterized Quantum Circuits

In this lab session we will have a closer look on how to train circuit-based models. At the end of this session you should know

  • how to build variational quantum classifiers

  • how to use different training techniques, especially gradient-based

  • what restrictions the variational-based models have and how we might overcome them

Where do we need to train parameterized circuits?

Our task

image.png

Gradients

Qiskit provides different methods to compute gradients of expectation values, let's explore them!

The parameterized ansatz state is ∣Ψ(θ⃗)⟩=U(θ⃗)∣0⟩|\Psi(\vec\theta)\rangle = U(\vec\theta)|0\rangle where U(θ⃗)U(\vec\theta) is given be the following circuit

from qiskit.circuit.library import RealAmplitudes ansatz = RealAmplitudes(num_qubits=2, reps=1, entanglement='linear').decompose() ansatz.draw('mpl', style='iqx')

and our Hamiltonian in this example is H^=Z^⨂Z^\hat H = \hat Z \bigotimes \hat Z

from qiskit.opflow import Z, I hamiltonian = Z ^ Z

Plugged together to ⟨Ψ(θ⃗)∣H^∣Ψ(θ⃗)⟩\langle\Psi(\vec\theta) | \hat H | \Psi(\vec\theta)\rangle:

from qiskit.opflow import StateFn expectation = StateFn(hamiltonian, is_measurement=True) @ StateFn(ansatz)

To make things concrete, let's fix a point p⃗\vec p and an index ii and ask: What's the derivative of the expectation value with respect to parameter θi\theta_i at point p⃗\vec p?

∂∂θi⟨Ψ(θ⃗)∣H^∣Ψ(θ⃗)⟩∣θ⃗=p⃗\frac{\partial}{\partial \theta_i} \langle \Psi(\vec\theta) | \hat H | \Psi(\vec\theta) \rangle \bigg\vert_{\vec\theta = \vec p}

We'll choose a random point p⃗\vec p and index i=2i=2 (remember we start counting from 0).

import numpy as np point = np.random.random(ansatz.num_parameters) index = 2

Throughout this session we'll use a shot-based simulator with 8192 shots.

from qiskit import Aer from qiskit.utils import QuantumInstance backend = Aer.get_backend('qasm_simulator') q_instance = QuantumInstance(backend, shots = 8192, seed_simulator = 2718, seed_transpiler = 2718)

Computing expectation values

We'll be using expectation values a lot, so let's recap how that worked in Qiskit.

With the qiskit.opflow module we can write and evaluate expectation values. The general structure for an expectation value where the state is prepared by a circuit is

expectation = StateFn(hamiltonian, is_measurement=True) @ StateFn(circuit) result = expectation.eval()

The above code uses plain matrix multiplication to evaluate the expected value, which is inefficient for large numbers of qubits. Instead, we can use a simulator (or a real quantum device) to evaluate the circuits by using a CircuitSampler in conjunction with an expectation converter like PauliExpectation as

sampler = CircuitSampler(q_instance) # q_instance is the QuantumInstance from the beginning of the notebook expectation = StateFn(hamiltonian, is_measurement=True) @ StateFn(circuit) in_pauli_basis = PauliExpectation().convert(expectation) result = sampler.convert(in_pauli_basis).eval()

Exercise 1: Calculate the expected value of the following Hamiltonian H and state prepared by the circuit U with plain matrix multiplication.

from qiskit.circuit import QuantumCircuit from qiskit.opflow import Z, X H = X ^ X U = QuantumCircuit(2) U.h(0) U.cx(0, 1) # YOUR CODE HERE

Exercise 2: Evaluate the expectation value with the QASM simulator by using the QuantumInstance object q_instance defined in the beginning of the notebook.

from qiskit.opflow import CircuitSampler, PauliExpectation sampler = CircuitSampler(q_instance) # YOUR CODE HERE

Finite difference gradients

Arguably the simplest way to approximate gradients is with a finite difference scheme. This works independent of the function's inner, possibly very complex, structure.

from qiskit.opflow import PauliExpectation, CircuitSampler expectation = StateFn(hamiltonian, is_measurement=True) @ StateFn(ansatz) in_pauli_basis = PauliExpectation().convert(expectation) sampler = CircuitSampler(q_instance) def evaluate_expectation(x): value_dict = dict(zip(ansatz.parameters, x)) result = sampler.convert(in_pauli_basis, params=value_dict).eval() return np.real(result)
eps = 0.2 e_i = np.identity(point.size)[:, index] # identity vector with a 1 at index ``index``, otherwise 0 plus = point + eps * e_i minus = point - eps * e_i
finite_difference = (evaluate_expectation(plus) - evaluate_expectation(minus)) / (2 * eps) print(finite_difference)

Instead of doing this manually, you can use Qiskit's Gradient class for this.

from qiskit.opflow import Gradient expectation = StateFn(hamiltonian, is_measurement=True) @ StateFn(ansatz) shifter = Gradient('fin_diff', analytic=False, epsilon=eps) grad = shifter.convert(expectation, params=ansatz.parameters[index]) print(grad)
value_dict = dict(zip(ansatz.parameters, point)) sampler.convert(grad, value_dict).eval().real

Finite difference gradients can be volatile on noisy functions and using the exact formula for the gradient can be more stable.

Analytic gradients

Luckily there's a the parameter shift rule -- a simple formula for circuit-based gradients. (Note: here only stated for Pauli-rotations without coefficients, see Evaluating analytic gradients on quantum hardware)

image.png

eps = np.pi / 2 e_i = np.identity(point.size)[:, index] # identity vector with a 1 at index ``index``, otherwise 0 plus = point + eps * e_i minus = point - eps * e_i finite_difference = (evaluate_expectation(plus) - evaluate_expectation(minus)) / 2 print(finite_difference)
expectation = StateFn(hamiltonian, is_measurement=True) @ StateFn(ansatz) shifter = Gradient() # parameter-shift rule is the default grad = shifter.convert(expectation, params=ansatz.parameters[index]) sampler.convert(grad, value_dict).eval().real

Based on the same principle, there's the linear combination of unitaries approach, that uses only one circuit but an additional auxiliary qubit.

expectation = StateFn(hamiltonian, is_measurement=True) @ StateFn(ansatz) shifter = Gradient('lin_comb') # parameter-shift rule is the default grad = shifter.convert(expectation, params=ansatz.parameters[index]) sampler.convert(grad, value_dict).eval().real

Let's try optimizing!

We fixed an initial point for reproducibility.

# initial_point = np.random.random(ansatz.num_parameters) initial_point = np.array([0.43253681, 0.09507794, 0.42805949, 0.34210341])

Similar to how we have a function to evaluate the expectation value, we'll need a function to evaluate the gradient.

expectation = StateFn(hamiltonian, is_measurement=True).compose(StateFn(ansatz)) gradient = Gradient().convert(expectation) gradient_in_pauli_basis = PauliExpectation().convert(gradient) sampler = CircuitSampler(q_instance) def evaluate_gradient(x): value_dict = dict(zip(ansatz.parameters, x)) result = sampler.convert(gradient_in_pauli_basis, params=value_dict).eval() # add parameters in here! return np.real(result)

To compare the convergence of the optimizers, we can keep track of the loss at each step by using a callback function.

image.png

from qiskit.algorithms.optimizers import GradientDescent gd_loss = [] def gd_callback(nfevs, x, fx, stepsize): gd_loss.append(fx) gd = GradientDescent(maxiter=300, learning_rate=0.01, callback=gd_callback)

And now we start the optimization and plot the loss!

x_opt, fx_opt, nfevs = gd.optimize(initial_point.size, # number of parameters evaluate_expectation, # function to minimize gradient_function=evaluate_gradient, # function to evaluate the gradient initial_point=initial_point) # initial point
import matplotlib import matplotlib.pyplot as plt matplotlib.rcParams['font.size'] = 14 plt.figure(figsize=(12, 6)) plt.plot(gd_loss, label='vanilla gradient descent') plt.axhline(-1, ls='--', c='tab:red', label='target') plt.ylabel('loss') plt.xlabel('iterations') plt.legend()

Does this always work?

image.png

Natural gradients

image-2.png

image.png

image.png

With Qiskit, we can evaluate the natural gradient by using the NaturalGradient class instead of the Gradient!

from qiskit.opflow import NaturalGradient

Analogously to the function to compute gradients, we can now write a function to evaluate the natural gradients.

expectation = StateFn(hamiltonian, is_measurement=True).compose(StateFn(ansatz)) natural_gradient = NaturalGradient(regularization='ridge').convert(expectation) natural_gradient_in_pauli_basis = PauliExpectation().convert(natural_gradient) sampler = CircuitSampler(q_instance, caching="all") def evaluate_natural_gradient(x): value_dict = dict(zip(ansatz.parameters, x)) result = sampler.convert(natural_gradient, params=value_dict).eval() return np.real(result)

And as you can see they do indeed differ!

print('Vanilla gradient:', evaluate_gradient(initial_point)) print('Natural gradient:', evaluate_natural_gradient(initial_point))

Let's look at how this influences the convergence.

qng_loss = [] def qng_callback(nfevs, x, fx, stepsize): qng_loss.append(fx) qng = GradientDescent(maxiter=300, learning_rate=0.01, callback=qng_callback)
x_opt, fx_opt, nfevs = qng.optimize(initial_point.size, evaluate_expectation, gradient_function=evaluate_natural_gradient, initial_point=initial_point)
def plot_loss(): plt.figure(figsize=(12, 6)) plt.plot(gd_loss, 'tab:blue', label='vanilla gradient descent') plt.plot(qng_loss, 'tab:green', label='quantum natural gradient') plt.axhline(-1, c='tab:red', ls='--', label='target') plt.ylabel('loss') plt.xlabel('iterations') plt.legend() plot_loss()

This sounds great! But if we think about how many circuits we need to evaluate these gradients can become costly!

image.png

Simultaneous Perturbation Stochastic Approximation

Idea: avoid the expensive gradient evaluation by sampling from the gradients. Since we don't care about the exact values but only about convergence, an unbiased sampling should on average work equally well!

image.png

This optimization technique is known as SPSA. How does it perform?

from qiskit.algorithms.optimizers import SPSA spsa_loss = [] def spsa_callback(nfev, x, fx, stepsize, accepted): spsa_loss.append(fx) spsa = SPSA(maxiter=300, learning_rate=0.01, perturbation=0.01, callback=spsa_callback) x_opt, fx_opt, nfevs = spsa.optimize(initial_point.size, evaluate_expectation, initial_point=initial_point)
def plot_loss(): plt.figure(figsize=(12, 6)) plt.plot(gd_loss, 'tab:blue', label='vanilla gradient descent') plt.plot(qng_loss, 'tab:green', label='quantum natural gradient') plt.plot(spsa_loss, 'tab:blue', ls='--', label='SPSA') plt.axhline(-1, c='tab:red', ls='--', label='target') plt.ylabel('loss') plt.xlabel('iterations') plt.legend() plot_loss()

And just at a fraction of the cost!

Can we do the same for natural gradients?

image.png

image.png

It turns out -- yes, we can!

We'll skip the details here, but the idea is to sample not only from the gradient but to extend this to the quantum Fisher information and thus to the natural gradient.

from qiskit.algorithms.optimizers import QNSPSA qnspsa_loss = [] def qnspsa_callback(nfev, x, fx, stepsize, accepted): qnspsa_loss.append(fx) fidelity = QNSPSA.get_fidelity(ansatz, q_instance, expectation=PauliExpectation()) qnspsa = QNSPSA(fidelity, maxiter=300, learning_rate=0.01, perturbation=0.01, callback=qnspsa_callback) x_opt, fx_opt, nfevs = qnspsa.optimize(initial_point.size, evaluate_expectation, initial_point=initial_point)
def plot_loss(): plt.figure(figsize=(12, 6)) plt.plot(gd_loss, 'tab:blue', label='vanilla gradient descent') plt.plot(qng_loss, 'tab:green', label='quantum natural gradient') plt.plot(spsa_loss, 'tab:blue', ls='--', label='SPSA') plt.plot(qnspsa_loss, 'tab:green', ls='--', label='QN-SPSA') plt.axhline(-1, c='tab:red', ls='--', label='target') plt.ylabel('loss') plt.xlabel('iterations') plt.legend() plot_loss()

What are the costs?

image.png

Training in practice

Today, evaluating gradients is really expensive and the improved accuracy is not that valuable since we have noisy readout anyways. Therefore, in practice, people often resort to using SPSA. To improve convergence, we do however not use a constant learning rate, but an exponentially decreasing one.

Qiskit will try to automatically calibrate the learning rate to the model if you don't specify the learning rate.

autospsa_loss = [] def autospsa_callback(nfev, x, fx, stepsize, accepted): autospsa_loss.append(fx) autospsa = SPSA(maxiter=300, learning_rate=None, perturbation=None, callback=autospsa_callback) x_opt, fx_opt, nfevs = autospsa.optimize(initial_point.size, evaluate_expectation, initial_point=initial_point)
def plot_loss(): plt.figure(figsize=(12, 6)) plt.plot(gd_loss, 'tab:blue', label='vanilla gradient descent') plt.plot(qng_loss, 'tab:green', label='quantum natural gradient') plt.plot(spsa_loss, 'tab:blue', ls='--', label='SPSA') plt.plot(qnspsa_loss, 'tab:green', ls='--', label='QN-SPSA') plt.plot(autospsa_loss, 'tab:red', label='Powerlaw SPSA') plt.axhline(-1, c='tab:red', ls='--', label='target') plt.ylabel('loss') plt.xlabel('iterations') plt.legend() plot_loss()

Training a new loss function.

In this exercise we'll train a different loss function than before: A transverse field Ising Hamiltonian on a linear chain with 3 spins

H^=−Z0Z1−Z1Z2−X0−X1−X2\hat H = -Z_0 Z_1 - Z_1 Z_2 - X_0 - X_1 - X_2

or, spelling out all operations H^=−Z⊗Z⊗I−I⊗Z⊗Z−X⊗I⊗I−I⊗X⊗I−I⊗I⊗X. \hat H = -Z \otimes Z \otimes I - I \otimes Z \otimes Z - X \otimes I \otimes I - I \otimes X \otimes I - I \otimes I \otimes X.

Exercise 3: Define the Hamiltonian with Opflow's operators. Make sure to add brackets around the tensors, e.g. (X ^ X) + (Z ^ I) for X⊗X+Z⊗IX \otimes X + Z \otimes I.

# YOUR CODE HERE

Exercise 4: We'll use the EfficientSU2 variational form as ansatz, since now we care about complex amplitudes. Use the evaluate_tfi function, which evaluates the energy of the transverse-field Ising Hamiltonian, to find the optimal parameters with SPSA.

from qiskit.circuit.library import EfficientSU2 efficient_su2 = EfficientSU2(3, entanglement="linear", reps=2) tfi_sampler = CircuitSampler(q_instance) def evaluate_tfi(parameters): exp = StateFn(H_tfi, is_measurement=True).compose(StateFn(efficient_su2)) value_dict = dict(zip(efficient_su2.parameters, parameters)) result = tfi_sampler.convert(PauliExpectation().convert(exp), params=value_dict).eval() return np.real(result)

In this cell, use the SPSA optimizer to find the minimum.

Hint: Use the autocalibration (by not specifying the learning rate and perturbation) and 300 iterations to get close enough to the minimum.

# target energy tfi_target = -3.4939592074349326 # initial point for reproducibility tfi_init = np.array([0.95667807, 0.06192812, 0.47615196, 0.83809827, 0.89022282, 0.27140831, 0.9540853 , 0.41374024, 0.92595507, 0.76150126, 0.8701938 , 0.05096063, 0.25476016, 0.71807858, 0.85661325, 0.48311132, 0.43623886, 0.6371297 ])
# YOUR CODE HERE

How can you use it in VQC/QNNs?

Generating data

We're using some artificial dataset provided by Qiskit's ML package. There are other datasets available, like Iris or Wine, but we'll keep it simple and 2D so we can plot the data.

from qiskit_machine_learning.datasets import ad_hoc_data training_features, training_labels, test_features, test_labels = ad_hoc_data( training_size=20, test_size=10, n=2, one_hot=False, gap=0.5 ) # the training labels are in {0, 1} but we'll use {-1, 1} as class labels! training_labels = 2 * training_labels - 1 test_labels = 2 * test_labels - 1
def plot_sampled_data(): from matplotlib.patches import Patch from matplotlib.lines import Line2D import matplotlib.pyplot as plt plt.figure(figsize=(12,6)) for feature, label in zip(training_features, training_labels): marker = 'o' color = 'tab:green' if label == -1 else 'tab:blue' plt.scatter(feature[0], feature[1], marker=marker, s=100, color=color) for feature, label in zip(test_features, test_labels): marker = 's' plt.scatter(feature[0], feature[1], marker=marker, s=100, facecolor='none', edgecolor='k') legend_elements = [ Line2D([0], [0], marker='o', c='w', mfc='tab:green', label='A', ms=15), Line2D([0], [0], marker='o', c='w', mfc='tab:blue', label='B', ms=15), Line2D([0], [0], marker='s', c='w', mfc='none', mec='k', label='test features', ms=10) ] plt.legend(handles=legend_elements, bbox_to_anchor=(1, 0.6)) plt.title('Training & test data') plt.xlabel('$x$') plt.ylabel('$y$') plot_sampled_data()

Building a variational quantum classifier

Let's get the constituents we need:

  • a Feature Map to encode the data

  • an Ansatz to train

  • an Observable to evaluate

We'll use a standard feature map from the circuit library.

from qiskit.circuit.library import ZZFeatureMap dim = 2 feature_map = ZZFeatureMap(dim, reps=1).decompose() # let's keep it simple! feature_map.draw('mpl', style='iqx')

We don't care about complex amplitudes so let's use an ansatz with only real amplitudes.

ansatz = RealAmplitudes(num_qubits=dim, entanglement='linear', reps=1).decompose() # also simple here! ansatz.draw('mpl', style='iqx')

Put together our circuit is

circuit = feature_map.compose(ansatz) circuit.draw('mpl', style='iqx')

And we'll use global ⨂iZ^i\bigotimes_{i} \hat Z_i operators for the expectation values.

hamiltonian = Z ^ Z # global Z operators

Classifying our data

Now we understood all the parts, it's time to classify the data. We're using Qiskit's ML package for that and it's OpflowQNN class to describe the circuit and expectation value along with a NeuralNetworkClassifier for the training.

First, we'll do vanilla gradient descent.

gd_qnn_loss = [] def gd_qnn_callback(*args): gd_qnn_loss.append(args[2]) gd = GradientDescent(maxiter=100, learning_rate=0.01, callback=gd_qnn_callback)
from qiskit_machine_learning.neural_networks import OpflowQNN qnn_expectation = StateFn(hamiltonian, is_measurement=True) @ StateFn(circuit) qnn = OpflowQNN(qnn_expectation, input_params=list(feature_map.parameters), weight_params=list(ansatz.parameters), exp_val=PauliExpectation(), gradient=Gradient(), # <-- Parameter-Shift gradients quantum_instance=q_instance)

Now we can define the classifier, which takes the QNN, the loss and the optimizer.

from qiskit_machine_learning.algorithms import NeuralNetworkClassifier #initial_point = np.array([0.2, 0.1, 0.3, 0.4]) classifier = NeuralNetworkClassifier(qnn, optimizer=gd)

... and train!

classifier.fit(training_features, training_labels)

To predict the new labels for the test features we can use the predict method.

predicted = classifier.predict(test_features)
def plot_predicted(): from matplotlib.lines import Line2D plt.figure(figsize=(12, 6)) for feature, label in zip(training_features, training_labels): marker = 'o' color = 'tab:green' if label == -1 else 'tab:blue' plt.scatter(feature[0], feature[1], marker=marker, s=100, color=color) for feature, label, pred in zip(test_features, test_labels, predicted): marker = 's' color = 'tab:green' if pred == -1 else 'tab:blue' if label != pred: # mark wrongly classified plt.scatter(feature[0], feature[1], marker='o', s=500, linewidths=2.5, facecolor='none', edgecolor='tab:red') plt.scatter(feature[0], feature[1], marker=marker, s=100, color=color) legend_elements = [ Line2D([0], [0], marker='o', c='w', mfc='tab:green', label='A', ms=15), Line2D([0], [0], marker='o', c='w', mfc='tab:blue', label='B', ms=15), Line2D([0], [0], marker='s', c='w', mfc='tab:green', label='predict A', ms=10), Line2D([0], [0], marker='s', c='w', mfc='tab:blue', label='predict B', ms=10), Line2D([0], [0], marker='o', c='w', mfc='none', mec='tab:red', label='wrongly classified', mew=2, ms=15) ] plt.legend(handles=legend_elements, bbox_to_anchor=(1, 0.7)) plt.title('Training & test data') plt.xlabel('$x$') plt.ylabel('$y$') plot_predicted()

And since we know how to use the natural gradients, we can now train with the QNN with natural gradient descent by replacing Gradient with NaturalGradient.

qng_qnn_loss = [] def qng_qnn_callback(*args): qng_qnn_loss.append(args[2]) gd = GradientDescent(maxiter=100, learning_rate=0.01, callback=qng_qnn_callback)
qnn = OpflowQNN(qnn_expectation, input_params=list(feature_map.parameters), weight_params=list(ansatz.parameters), gradient=NaturalGradient(regularization='ridge'), # <-- using Natural Gradients! quantum_instance=q_instance) classifier = NeuralNetworkClassifier(qnn, optimizer=gd)#, initial_point=initial_point)
classifier.fit(training_features, training_labels)
def plot_losses(): plt.figure(figsize=(12, 6)) plt.plot(gd_qnn_loss, 'tab:blue', marker='o', label='vanilla gradients') plt.plot(qng_qnn_loss, 'tab:green', marker='o', label='natural gradients') plt.xlabel('iterations') plt.ylabel('loss') plt.legend(loc='best') plot_losses()

Limits in training circuits

Nothing in life is for free... that includes sufficiently large gradients for all models!

We've seen that training with gradients works well on the small models we tested. But can we expect the same if we increase the number of qubits? To investigate that we measure the variance of the gradients for different model sizes. The idea is simple: if the variance is really small, we don't have enough information to update our parameters.

Exponentially vanishing gradients (Barren plateaus)

Let's pick our favorite example from the gradients and see what happens if we increase the number of qubits and layers.

image.png

def sample_gradients(num_qubits, reps, local=False): """Sample the gradient of our model for ``num_qubits`` qubits and ``reps`` repetitions. We sample 100 times for random parameters and compute the gradient of the first RY rotation gate. """ index = num_qubits - 1 # local or global operator if local: operator = Z ^ Z ^ (I ^ (num_qubits - 2)) else: operator = Z ^ num_qubits # real amplitudes ansatz ansatz = RealAmplitudes(num_qubits, entanglement='linear', reps=reps) # construct Gradient we want to evaluate for different values expectation = StateFn(operator, is_measurement=True).compose(StateFn(ansatz)) grad = Gradient().convert(expectation, params=ansatz.parameters[index]) # evaluate for 100 different, random parameter values num_points = 100 grads = [] for _ in range(num_points): # points are uniformly chosen from [0, pi] point = np.random.uniform(0, np.pi, ansatz.num_parameters) value_dict = dict(zip(ansatz.parameters, point)) grads.append(sampler.convert(grad, value_dict).eval()) return grads

Let's plot from 2 to 12 qubits.

num_qubits = list(range(2, 13)) reps = num_qubits # number of layers = numbers of qubits gradients = [sample_gradients(n, r) for n, r in zip(num_qubits, reps)]
fit = np.polyfit(num_qubits, np.log(np.var(gradients, axis=1)), deg=1) x = np.linspace(num_qubits[0], num_qubits[-1], 200) plt.figure(figsize=(12, 6)) plt.semilogy(num_qubits, np.var(gradients, axis=1), 'o-', label='measured variance') plt.semilogy(x, np.exp(fit[0] * x + fit[1]), 'r--', label=f'exponential fit w/ {fit[0]:.2f}') plt.xlabel('number of qubits') plt.ylabel(r'$\mathrm{Var}[\partial_{\theta 1} \langle E(\theta) \rangle]$') plt.legend(loc='best')

Oh no! The variance decreases exponentially! This means our gradients contain less and less information and we'll have a hard time to train the model. This is known as the Barren plateau problem or exponentially vanishing gradients.

Exploratory Exercise: Do Natural Gradient suffer from vanishing gradients?

Repeat the Barren plateau plot for natural gradients instead of standard gradients. For this, write a new function sample_natural_gradients that computes the natural gradient instead of the gradient.

from qiskit.opflow import NaturalGradient def sample_natural_gradients(num_qubits, reps): index = num_qubits - 1 operator = Z ^ num_qubits ansatz = RealAmplitudes(num_qubits, entanglement='linear', reps=reps) expectation = StateFn(operator, is_measurement=True).compose(StateFn(ansatz)) # grad = TODO: ``grad`` should be the natural gradient for the parameter at index ``index``. # Hint: Check the ``sample_gradients`` function, this one is almost the same. grad = NaturalGradient().convert(expectation, params=ansatz.parameters[index]) num_points = 100 grads = [] for _ in range(num_points): point = np.random.uniform(0, np.pi, ansatz.num_parameters) value_dict = dict(zip(ansatz.parameters, point)) grads.append(sampler.convert(grad, value_dict).eval()) return grads

Now we'll repeat the experiment to check the variance of the gradients (this cell takes some time to run!)

num_qubits = list(range(2, 13)) reps = num_qubits # number of layers = numbers of qubits natural_gradients = [sample_natural_gradients(n, r) for n, r in zip(num_qubits, reps)]

and plot the results. What do you observe?

fit = np.polyfit(num_qubits, np.log(np.var(natural_gradients, axis=1)), deg=1) x = np.linspace(num_qubits[0], num_qubits[-1], 200) plt.figure(figsize=(12, 6)) plt.semilogy(num_qubits, np.var(gradients, axis=1), 'o-', label='vanilla gradients') plt.semilogy(num_qubits, np.var(natural_gradients, axis=1), 's-', label='natural gradients') plt.semilogy(x, np.exp(fit[0] * x + fit[1]), 'r--', label=f'exponential fit w/ {float(fit[0]):.2f}') plt.xlabel('number of qubits') plt.ylabel(r'$\mathrm{Var}[\partial_{\theta 1} \langle E(\theta) \rangle]$') plt.legend(loc='best')

What about shorter circuits?

Is there something we can do about these Barren plateaus? It's a hot topic in current research and there are some proposals to mitigate Barren plateaus.

Let's have a look on how global and local cost functions and the depth of the ansatz influences the Barren plateaus.

num_qubits = list(range(2, 13)) fixed_depth_global_gradients = [sample_gradients(n, 1) for n in num_qubits]
fit = np.polyfit(num_qubits, np.log(np.var(fixed_depth_global_gradients, axis=1)), deg=1) x = np.linspace(num_qubits[0], num_qubits[-1], 200) plt.figure(figsize=(12, 6)) plt.semilogy(num_qubits, np.var(gradients, axis=1), 'o-', label='global cost, linear depth') plt.semilogy(num_qubits, np.var(fixed_depth_global_gradients, axis=1), 'o-', label='global cost, constant depth') plt.semilogy(x, np.exp(fit[0] * x + fit[1]), 'r--', label=f'exponential fit w/ {fit[0]:.2f}') plt.xlabel('number of qubits') plt.ylabel(r'$\mathrm{Var}[\partial_{\theta 1} \langle E(\theta) \rangle]$') plt.legend(loc='best')

And what if we use local operators?

num_qubits = list(range(2, 13)) linear_depth_local_gradients = [sample_gradients(n, n, local=True) for n in num_qubits]
fit = np.polyfit(num_qubits, np.log(np.var(linear_depth_local_gradients, axis=1)), deg=1) x = np.linspace(num_qubits[0], num_qubits[-1], 200) plt.figure(figsize=(12, 6)) plt.semilogy(num_qubits, np.var(gradients, axis=1), 'o-', label='global cost, linear depth') plt.semilogy(num_qubits, np.var(fixed_depth_global_gradients, axis=1), 'o-', label='global cost, constant depth') plt.semilogy(num_qubits, np.var(linear_depth_local_gradients, axis=1), 'o-', label='local cost, linear depth') plt.semilogy(x, np.exp(fit[0] * x + fit[1]), 'r--', label=f'exponential fit w/ {fit[0]:.2f}') plt.xlabel('number of qubits') plt.ylabel(r'$\mathrm{Var}[\partial_{\theta 1} \langle E(\theta) \rangle]$') plt.legend(loc='best')
num_qubits = list(range(2, 13)) fixed_depth_local_gradients = [sample_gradients(n, 1, local=True) for n in num_qubits]
fit = np.polyfit(num_qubits, np.log(np.var(fixed_depth_local_gradients, axis=1)), deg=1) x = np.linspace(num_qubits[0], num_qubits[-1], 200) plt.figure(figsize=(12, 6)) plt.semilogy(num_qubits, np.var(gradients, axis=1), 'o-', label='global cost, linear depth') plt.semilogy(num_qubits, np.var(fixed_depth_global_gradients, axis=1), 'o-', label='global cost, constant depth') plt.semilogy(num_qubits, np.var(linear_depth_local_gradients, axis=1), 'o-', label='local cost, linear depth') plt.semilogy(num_qubits, np.var(fixed_depth_local_gradients, axis=1), 'o-', label='local cost, constant depth') plt.semilogy(x, np.exp(fit[0] * x + fit[1]), 'r--', label=f'exponential fit w/ {fit[0]:.2f}') plt.xlabel('number of qubits') plt.ylabel(r'$\mathrm{Var}[\partial_{\theta 1} \langle E(\theta) \rangle]$') plt.legend(loc='best')

Layerwise training

image.png

num_qubits = 6 operator = Z ^ Z ^ (I ^ (num_qubits - 4)) def minimize(circuit, optimizer): initial_point = np.random.random(circuit.num_parameters) exp = StateFn(operator, is_measurement=True) @ StateFn(circuit) grad = Gradient().convert(exp) # pauli basis exp = PauliExpectation().convert(exp) grad = PauliExpectation().convert(grad) sampler = CircuitSampler(q_instance, caching="all") def loss(x): values_dict = dict(zip(circuit.parameters, x)) return np.real(sampler.convert(exp, values_dict).eval()) def gradient(x): values_dict = dict(zip(circuit.parameters, x)) return np.real(sampler.convert(grad, values_dict).eval()) return optimizer.optimize(circuit.num_parameters, loss, gradient, initial_point=initial_point)

Useful feature: Qiskit's ansatz circuits are mutable!

circuit = RealAmplitudes(4, reps=1, entanglement='linear') circuit.decompose().draw('mpl', style='iqx')
circuit.reps = 5 circuit.decompose().draw('mpl', style='iqx')
def layerwise_training(ansatz, max_num_layers, optimizer): optimal_parameters = [] fopt = None for reps in range(1, max_num_layers): ansatz.reps = reps # bind already optimized parameters values_dict = dict(zip(ansatz.parameters, optimal_parameters)) partially_bound = ansatz.bind_parameters(values_dict) xopt, fopt, _ = minimize(partially_bound, optimizer) print('Circuit depth:', ansatz.depth(), 'best value:', fopt) optimal_parameters += list(xopt) return fopt, optimal_parameters
ansatz = RealAmplitudes(4, entanglement='linear') optimizer = GradientDescent(maxiter=50)
np.random.seed(12) fopt, optimal_parameters = layerwise_training(ansatz, 4, optimizer)

Summary

image.png