GitHub Repository: quantum-kittens/platypus
Path: blob/main/notebooks/summer-school/2021/lec5.2.ipynb
³⁸⁵⁵ views

Kernel: Python 3

Introduction to the Quantum Approximate Optimization Algorithm and Applications

The second lecture by Johannes was on the Quantum Approximate Optimization Algorithm (QAOA) and its applications. After a recap of the previous lecture, this lecture introduces Variational Quantum Eigensolvers (VQE), and starts with variational quantum circuits, and how they can effectively be used concurrently with classical optimizer. The second part is on Quadratic Unconstrained Binary Optimization (QUBO), and starts with optimization problems involving quadratic objective functions with linear and quadratic constraints. Johanes introduces the MaxCut problem and shows how to reframe it as QUBO.

Johannes introduces QAOA by providing a brief history, and shows how to formulate it as a layered circuit in variational form. He explains cost layer and mixer layers, followed with an example for QAOA and visualizing the energy landscape to give better understanding for an optimization problem. The lecture covers adiabatic quantum computing concepts, followed by some recent results for QAOA such as use of Conditional Value-at-Risk to speed up the optimization process. There is an extensive set of references for those interested.

Other resources

Download the lecturer's notes here
Read Schuld, Maria, and Petruccione, Francesco on Supervised Learning with Quantum Computers
Read Moler, Cleve; Van Loan, Charles F. on Nineteen Dubious Ways to Compute the Exponential of a Matrix, Twenty-Five Years Later

FAQ

Please explain what is an approximation ratio? The approximation ratio is the ratio of the optimal value an algorithm achieves. So imagine that the true minimum of an optimization problem is 10 and an algorithm returns a solution with value 5, this would correspond to an approximation ratio of 0.5

How can we build an efficient Ansatz in the VQE ? This is an active area of research. This paper has an overview of common ansaetze: https://arxiv.org/pdf/2101.08448.pdf

Is there a particular technique to translate a problem into a corresponding Hamiltonian modelling it? Yes, so in one of the previous slides we derived how to turn a general quadratic optimization problem into a Hamiltonian

Why do we need to solve MaxCut problem? A1: Many QUBO type problems can be rephrased as MaxCut problems, so if we solve MaxCut faster/better, the corresponding problems also gain that advantage.

A2: MaxCut is NP complete. Therefore any other np complete (np hard) problem can be translated in terms of MaxCut via Karp reduction. So if you can solve MaxCut, then you can solve any other np complete problem.

For VQE, if the gates are noisy, can we guarantee that after we restart the quantum part with the data from classical part the systems state trajectory moves along the previous previous trajectory. At the end considering quantum stochastic diff equations, there might be many trajectories depending on the noise? Noise can indeed affect the minimization. This paper https://arxiv.org/pdf/1704.05018.pdf has various numerical results that discuss this. For instance, using lots of entanglers in the parameterized circuit can be a buffer against as long as the strength of the noise isn't too large.

Only parallelism or any other reasons/benefits to choose quantum variational? Reduce the error

Where will the barrier in the circuit be used? The barriers are only vor visualisation purposes

Is there a possibility in adiabatic computing that the Hamiltonian is time dependent, and there are noncommutative operators and the Dyson expansion comes into the play? If yes, how is the simulation modified? should we use suzuki-trotter method? This paper actually talks about using a quantum computer to assist in the simulation of time dependent hamiltonians. https://arxiv.org/pdf/2101.07677.pdf

What's the difference between VQE and QAOA? You will see during the lecture that QAOA can be regarded as a special case of VQE where we choose the variational circuit according to the problem that we are looking at. So in a sense, VQE is more general in that we can choose an arbitrary variational form.

Can we use quadratic programs for maximize problems? Yes QP's can be used both for maximisation and minimisation problems

Since we have a hybrid classical quantum model, does the classical optimizer use the derivative to find the minimum? If it does, how do we find the derivative of the "cost" function? This entirely depends on the classical optimizer used. Some optimisers work without gradients and some optimisers estimate gradients by evaluating the cost function at different points.

What is the Hamiltonian in your VQE example? The Hamiltonian depends on the optimization problem that we are looking at

Why do we need the mixer Hamiltonian? Where does it come from? Without the mixer Hamiltonian, we could not use multiple layers of QAOA, since every cost layer has the same effect. So by alternating the cost and mixer layers we enlarge the space of quantum states that we can generate with out circuit. In the next section, you will also see that there is a connection between QAOA and adiabatic quantum computing which explains why we choose the mixer Hamiltonian the way we do.

What features determine the number of mixed layers? In theory, the more layers, the better since we are able to generate a larger set of quantum states with more layers. However , with more layers of course noise becomes a problem and the optimization becomes harder since we have more parameters to optimise for.

Sorry, what is the purpose of the mixer hamiltonian? See the Q&A for a detailed answer but basically it perturbs our quantum state after applying the cost layer, allowing us to add more layers and explore a larger space of quantum states with our variational form.

Does QAOA perform better on a noisy quantum computer than other algorithms(like shor,grover,deutch-josza)? If yes,how? Please check the Q&A session for an answer to this question. But basically, the classical optimizer can correct for some systematic noise.

You talk about limited device connectivity, can you please mention an example? So in the typical QAOA circuit we have interactions between arbitrary qubits, meaning there are two-bit gates applied to any possible qubit pair. However, real hardware devices are often not fully connected. That means qubit are usually positioned in a planar graph, like a grid, and two qubit gates can only be applied to qubits that are physically next to each other. We can convert arbitrary circuits into circuits that can be executed on hardware by introducing SWAP gates (these are gates that consist of three CNOT gates and SWAP the quantum states of two qubits), but that siginificantly increases the depth and number of gates in our circuit, making it more difficult to execute.

What was the Hamiltonian again in simple language? The Hamiltonian is an operator that describes the energy of a system In our case you can think of it as a matrix

The Hamiltonian contains all the information of the physical system. And its eigenvalues (when represented as a matrix) are the energies

Timestamps for live Q&A session

How does the classical optimizer works in QAOA since it cannot compute the derivative of the cost function with respect to the theta parameters? Answer was provided at timestamp 4m 54s in the Lecture 5.2 Live Q&A session

Additional written answers provided:

Darsh Kaushik like spsa does not need gradient OlivierLC So two options : 1) Derivative-free optimization (https://en.wikipedia.org/wiki/Derivative-free_optimization) or 2) "Analog" estimate of the derivative.

What's the role of the mixing Hamiltonian? Is it used to construct the initial uniform superposition state? Answer was provided at timestamp 6m 27s in the Lecture 5.2 Live Q&A session

Has the performance of QAOA been re-evaluated while using some of these warm-start (parameter concentration) strategies? Answer was provided at timestamp 9m 32s in the Lecture 5.2 Live Q&A session

Can u explain the advantage of adiabatic implementation of QAOA? Answer was provided at timestamp 11m 3s in the Lecture 5.2 Live Q&A session

Could you please give an intuitive idea behind the difference between the cost circuit and the mixer circuit ? Answer was provided at timestamp 13m 12s in the Lecture 5.2 Live Q&A session

Can you please compare the performance of QAOA in gate model and adiabatic model Quantum Computer? Answer was provided at timestamp 14m 33s in the Lecture 5.2 Live Q&A session

What are the inputs for a QUBO problem? How do we formulate any given real life problem into a QUBO problem? Answer was provided at timestamp 15m 42s in the Lecture 5.2 Live Q&A session

I've studied and learned a lot about the "simplex algorithm" and I see a lot of similarities. I've also come to know about a quantum implementation of it, is it useful in some way? Answer was provided at timestamp 18m 1s in the Lecture 5.2 Live Q&A session

Additional written answer provided:

Marc Rakoto Simplex is a linear programming optimization algorithm. Here quadratic (i.e. non linear) optimization. No relationship in classical optimization. No relationship in quantum optimization either, I'd say.

Which books would you recommend for studies that have applications? Answer was provided at timestamp 18m 37s in the Lecture 5.2 Live Q&A session

Suppose you have graph with positive and negative weights… Is there a way to use the same QAOA (MaxCut) algorithm for this problem? Or the method is only valid for positive weights? Answer was provided at timestamp 20m 16s in the Lecture 5.2 Live Q&A session

QAOA as a special instance of VQE: in VQE for chemistry the variational form is there to guess the ground state and the Hamiltonian is only entering via post rotations. Is the relation of QAOA and VQE that the UCCSD is somehow resembling the Hamiltonian evolution? Can you elaborate on the relation of both? Answer was provided at timestamp 21m 29s in the Lecture 5.2 Live Q&A session

Does QAOA perform better on a noisy quantum computer than other algorithms(like shor,grover,deutsch-jozsa)? If yes,why? Answer was provided at timestamp 24m 17s in the Lecture 5.2 Live Q&A session

Any guarantee on the closeness of the approx. ground state found adiabatically and the true ground state? Answer was provided at timestamp 25m 51s in the Lecture 5.2 Live Q&A session

Could you give examples of solutions for each algorithms where to apply? Answer was provided at timestamp 27m 51s in the Lecture 5.2 Live Q&A session

Are we doing it adiabatically to keep the process reversible? or is there any other reason? Written answer that was provided in the Ask a Question window:

Florian Preis It guarantees that the system stays in the ground state. Any quantum computation before measurement is reversible.

Is DWave an adiabatic computing platform, would you please compare pros and cons against superconducting computing platforms? Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller Yes DWave uses quantum annealing. This means that the hardware is specialised for certain problems, where it is possible to perform better than for instance IBM's superconducting quantum computes. However, they do not build universal quantum computesrs, meaning we can not run arbitrary quantum circuits on a DWave machine, and this is ultimately what we are most interested in.

Suppose you have graph with positive and negative weights… Is there a way to use the same QAOA (MaxCut) algorithm for this problem? Or the method is only valid for positive weights? Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller You can use QAOA with arbitrary weights, so they can be negative and positive. In the upcoming lab you can explore a little how different types of weights change the optimization landscape

How does qaoa compare with quantum annealing? Written answer that was provided in the Ask a Question window:

Givanildo Gramacho What does the anneling do? Marc Rakoto QAOA uses gradient descent. If quantum annealing is the quantum version of classical simulated annealing, there is no gradient descent but a stochastic move (Boltzman temperature, etc.).

How do we come up with the ansatz circuit in vqe .for example how do we know which gates to use the in the ansatz Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller There are a number of different variational circuits that can be used for VQE, but in general you could choose anything as a variational form. In QAOA we choose a specific variational form that is tailored to the problem instance we are looking at.

Is QAOA algorithm beneficial for near term quantum computers or what are the present advantages for using QAOA for optimization problems ? Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller One of the benefits is that QAOA might be less susceptible to noise because the classical optimizer can correct for some systematic errors. Compared to classical algorithms there is no benefit yet of using QAOA, but hopefully there will be as the hardware evolves in the coming years.

What did the x axis denote in the plots of hamiltonian in aqc (slide 30) Written answer that was provided in the Ask a Question window:

Darsh Kaushik is it the parameter theta Johannes Weidenfeller Yes, this is just a visualisation where you can think of the x-axis as representing a parameter that defines our quantum state

How do you derive Matrix W from the MaxCut graph? Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller This is roughly explained in slides 16-18 of the presentation but also expained in more detail in the upcoming Lab

I have heard that the expectation of QAOA become a bit modest recently. If it is true, is it due to the publication from Hasting' 19 (I understood local classical algorithms over-perform rather than QAOA) Or is it due to qubit number limitation? Johannes Weidenfeller The Hastings paper only makes statements about QAOA at low depth (p=1) and proponents of QAOA would argue that we only expect QAOA to work well for large p anyways. But there are a lot of factors that still limit the performance of QAOA in the near future,, simply because we can not run very deep algorithms on current machines. In a sense the question of QAOA performance can be split up in two parts. First of all, if we could execute algorithms noise-free (or in the fault tolerant regime) what is the theoretical advantage of QAOA over classical algorithms? And then secondly, taking hardware limitations into account, what part of this possible advantage remains when we look at executing QAOA on actual devices. A lot is still unknown about the first question (also because we can not empirically test it yet for large problems) and the answer to the second question can be a bit discouraging when taking all factors into account. However, in the end there is still a very good chance that QAOA scales more efficiently than classical algorithms for some problems and so it remains to be seen if this will translate into an actual advantage in the future.

What kind of strategy could be used for explore the landscape of the parameterized circuit in VQA? Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller This is completely up to the classical optimiser and we can use any strategy that is used for optimization of a continuous energy landscape. Some general ideas include multi start (i.e., performing optimization with different initial values) and gradient based approaches (by estimating the gradient through evaluations of the cost function)

Any reason for choosing the mixer as Rx instead of Ry or other kind of rotations in the space? Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller The mixer is chosen this way because we start in the equal superposition state which is the corresponding ground state of the X operator, and exponentiating that operator yields RX gates. However, in the warm start adaptation the mixer and corresponding initial state are changed based on some previously generated solution of the problem - So the mixer and initial state are not fixed and are sometimes chosen differently.

How can we encode a constrained binary optimization problem into Hamiltonian? Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller If you are asking about how to convert the constraints, have a look at the Qiskit tutorial for converters of Quadratic Programs where this is explained in a bit more detail https://qiskit.org/documentation/tutorials/optimization/2_converters_for_quadratic_programs.html Note that the tutorial might not be up to date to the newest Qiskit versions, but the ideas are explained there. If you are asking about how to get the Hamiltonian from a QUBO, this is explained on slide 18 and in more detail in the upcoming Lab notebook 😃

What was the Q value in the cost function? Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller Q was the name chosen for the cost matrix - So this is the matrix representing our optimisation problem

What is the difference between Max Cut and QAOA? Written answer that was provided in the Ask a Question window:

PRAJJWAL VIJAYWARGIYA MaxCut is a problem which the QAOA is applied to solve

What is the Mixer layer on the gate diagram? Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller The mixer layer in the original QAOA formulation corresponds to RX gates applied to each qubit. It can however be adapted to contain other gates (which is for example done in the warm starting QAOA).

Applications in cost minimization/profit maximization in economics? Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller Maybe have a look at the portfolio optimization problem: https://en.wikipedia.org/wiki/Portfolio_optimization This can be formulated as a quadratic program and converted to a QUBO

Keeping the aqc pov aside, why do we need mixer? why is the cost hamiltonian not enough? Also, if we change the initial state do we need to change only the mixer? Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller Typically, when changing the initial state we will change the mixer and keep the cost Hamiltonian because we would like to keep the connection to adiabatic quantum computing. Viewing QAOA as a variational algorithm, the mixer is needed to perturb the quantum state between applying the different cost layers. Since all cost layers have the same structure, there is no advantage in using multiple cost layers, unless we put a mixer layer in between.

Can you please review the conversion of the cost function to the hamiltonian operator form? Written answer that was provided in the Ask a Question window:

Johannes Weidenfeller This is also explained in the upcoming Lab notebook, so try to review the explanation there and ask in the Discord channel if you still have questions

Introduction to the Quantum Approximate Optimization Algorithm and Applications

Other resources

FAQ

Timestamps for live Q&A session

Suggested reading

Suggested links

Product

Resources

Company