Path: blob/main/qaoa-for-max-cut/03_Recursive-divide-and-conquer.ipynb
579 views
Approximate Max-Cut using a Divide-and-Conquer Implementation of QAOA
Lab 3 Recursive Divide-and-Conquer QAOA
3.1 Overview
In the previous lab, our example graph was small enough that when divided into subgraphs, there were few subgraphs. Furthermore, each subgraph was sufficiently small that we could apply QAOA to approximate the max cuts of the subgraphs. In this lab, we'll consider much bigger graphs which have large subgraph partitions, both in terms the size of the subgraphs and the number of subgraphs. When the subgraphs are prohibitively large to find a max cut approximation with QAOA, we'll recursively apply the divide-and-conquer algorithm to those subgraphs. Also, if the number of subgraphs in a partition is so large that the QAOA application during the merger stage of the algorithm cannot be carried out, we can recursively apply a divide-and-conquer QAOA to the merger graph as well. Finally, we'll demonstrate a parallel implementation of the algorithm and discuss several options for improving its accuracy.
3.1.1 Outline of the lab
This lab is structured similarly to Labs 1 and 2. We will be translating the code developed in Labs 1 and 2 into functions for recursive implementation.
3.2 Divide
Problem definition and classical approximation
Define a function to carry out the divide stage of the algorithm recursively
3.3 Conquer
Define a function to carry out the conquer stage of the algorithm recursively
3.4 Merge
Define a function to carry out the merger stage of the algorithm recursively
Combine all the parts together to find the max cut of
sampleGraph3
3.5 Enabling Distributed Quantum Computing
Run the recursive algorithm in parallel using MPI
3.6 Further directions for experimentation
Summarize several features of the divide-and-conquer QAOA that could be modified for improved performance
3.1.2 Learning objectives
The learning objectives of this tutorial are:
Implement a recursive divide-and-conquer QAOA for Max Cut using CUDA Quantum
Select from different target devices to execute quantum circuits
Adapt the CUDA Quantum code to run in parallel through MPI
Experiment with different layer counts and seeds to recognize the effect of increased layers and initial parameter choices in QAOA
Identify several of the design choices available for implementing QAOA
3.2 Divide
3.2.1 Defining a larger example graph
In Lab 2 we experimented with a graph of 30 vertices. Let's ramp this up and start with a graph of 100 vertices!
We'll consider a randomly generated graph with a little more structure than the previous lab. The graph type is called the Newman Watts Strogatz network model. After you've completed the lab with this example, feel free to edit the parameters and uncomment/comment portions of the code below to test out other graphs.
Let's approximate the max cut for this graph classically. This will take a little longer than the classical approximation of max cut for the smaller graph in Lab 2, and as you'll see there is a much larger variation in approximation results.
The code for the subgraph partitioning from Lab 2 is copied below. We'll need to adapt this code for recursion. But first, to appreciate the need for a recursive algorithm, let's use this code to partition the sampleGraph3
graph into smaller subgraphs to get a sense of the scale of the graph.
The graph G0
may be too large for a standard implementation of QAOA, especially if many layers of the QAOA kernel are used or if the algorithm is executed on a CPU. Even larger and denser graphs will lead to partitions that are just unmanageable for even the most powerful machines. Therefore, we will recursively call the divide-and-conquer QAOA algorithm to find the max cut solution. In fact, in the interest of time, we will apply the divide stage of the algorithm to any subgraph that has more than 14 vertices. To keep track of the original graph, its subgraphs, their subgraphs, etc. we will need to edit the subgraph partitioning function from Lab 2 and adapt a new subgraph naming scheme. In the new naming scheme, Global:0
will be the first subgraph in the partitioning of sampleGraph3
. If Global:0
was large and was further subdivided, the the name Global:0:1
will represent the second subgraph of the subgraph partitioning of Global:0
. The naming scheme continues to follow this pattern. This new naming scheme is added to the subgraph_partition
function in the code block below.
Edit the three FIX_ME
lines in the code block below to call subgraphpartition
recursively. We'll assume that we can handle QAOA instances of 14 qubits. Therefore any subgraph of more than 14 vertices will get subdivided.
3.3 Conquer
As you experiment with subdividing other graphs, you might notice that the last few subgraphs may only contain 1 vertex and/or no edges. The max cut for these subgraphs is trivial. It would be wasteful to run the QAOA subroutine on these subgraphs, and a Hamiltonian = 0 will cause problems with the optimizer loop. Let's edit the QAOA code below to return a random assignment of 0s and 1s for graphs that contain only one vertex and/or no edges.
Exercise: Edit the code below to return a random string of 0s and 1s, with the same length as the number of vertices of the graph.
3.4 Merge
The functions below are based on code from Lab 2 that carry out the merger step of the divide-and-conquer QAOA. As you execute the code blocks, familiarize yourself with the function definitions.
In this lab, we have capped our subgraph partitions to only have 11 subgraphs. This means that the merger graphs that we generate are all limited in size. Therefore, few qubits are needed for the QAOA circuits that are created during the merging
phase of the algorithm. For much larger and denser graphs, we may find partitions that result in dozens of subgraphs whose solutions need to be merged together. For these cases, we can apply another divide-and-conquer protocol at the merger stage of the algorithm. We leave this as an optional bonus exercise.
3.4 Putting it all together
We now have all of the components necessary to define a function subgraph_solution
that will carry out the divide-and-conquer QAOA recursively.
Exercise: Fill in the FIX_ME
s with the functions from the list:
border
createMergerGraph
merging
new_colors
qaoa_for_graph
subgraphpartition
subgraph_solution
unaltered_colors
Now let's run the recursive algorithm on a sampleGraph3
. Execute the cell below to run the algorithm sequentially. This might be a good time to stand up and stretch or grab a cup of coffee. It may take a few minutes for the program to run on one GPU sequentially. We'll explore how to parallelize this routine in the next section.
This solution isn't as good as the classical approximations. In the remaining sections of this notebook, we'll suggest a few ways to improve the results.
3.5 Enabling Distributed Quantum Computing
There are many opportunities for parallelization of quantum circuits. In this section we'll run the QAOA kernels for each subgraph simultaneously on multiple GPU processes, and this can easily be adapted to run on multiple QPUs in the future.
Important Before proceeding, you will need to switch to a runtime with access to a GPU. If you do restart your kernel, make sure to reload the packages below. If you are running on Google Colab and switch to a GPU runtime, you'll need to reinstall CUDA-Q by commenting out the indicated code.
The max cut problem for the sampleGraph3
took some time to solve sequentially. Like we did in Lab 2, we can create a python script to run the recursive algorithm in parallel. One option is to distribute the top-level of the recursion (e.g. solutions to Global:0
, Global:1
, ...Global:n
) to the GPU processes, and then merge those results back together on GPU process 0. We've created the script for this and saved it as Example-03.py. If you have not yet already done so, download Example-03.py and save it in your working directory. Execute the cell below to find a max cut approximation of sampleGraph3
using 4 GPU processes.
Notice how much more quickly the solution was found compared to the sequential implementation. With this speed up, let's try to find even better approximations. Try editing the layer_count
for the QAOA max cut solution of the small graphs. You can find this on line 502: layer_count =1 # Layer count for the QAOA max cut
of the Example-03.py file. Also, try out different seed values (line 647: seed = 13 # Edit this seed for new initial parameters among other seed-dependent elements
). See how much our results change!
Feel free to experiment with different sample graphs (lines 706-730
) by commenting the code and changing the number of vertices and other graph variables.
Can you think of alternative methods of parallelizing the tasks in the divide-and-conquer QAOA?
3.6 Further Directions for Experimentation
There were many aspects of the QAOA algorithm that we held constant throughout this series of labs. These include the choice of the graph partitioning routine, the initial superposition state in the QAOA kernel, the number of layers in the QAOA kernels, the initial parameters used in the optimization loop, and the optimizer selected for the optimization loop. In addition to making changes to these features, the QAOA kernel could be swapped out for an alternative variational ansatz, and the parallelization protocol could be optimized for the given problem.
Graph partitioning: The graph partitioning can drastically affect the results of the algorithm. There are many ways to subdivide a graph, and we encourage you to explore other possibilities. For example, the Louvain algorithm was used by Yue et al. and by Guerreschi for quantum divide-and-conquer approaches to the Max Cut problem. Angel Rodriquez-Fernandez et al. analyzed five other clustering algorithms to improve the Goemans-Williamson classical approximation for Max-Cut, and these clustering algorithms could be experimented with here. Looking at this from another point of view, the graph partitioning corresponds to circuit cutting. There have been advances in identifying efficient cuts of a quantum circuit that result in faster circuit knitting (or as we referred to in these notebooks, the merger stage of the algorithm). One example of this is Adaptive Circuit Knitting. Read more about Adaptive Circuit Knitting in section VIB of this survey paper.
Initial state: In Lab 1, we initiated the QAOA kernel in an equal superposition state to represent all possible colorings of the vertices. However, if we had some information about the structure of the graph, for instance an approximate solution from a classical relaxation of the problems, we could "warm start" the circuit with this information as in Egger et al. and Tate et al.. For this to work, the mixer Hamiltonian would also need to be adjusted.
Layer count: We've already begun to experiment with the layer count. It is known that as the number of layers increase to infinity, the QAOA max cut approximations approach the max cut value of the graph (Farhi and Goldstone).
Initial parameters: As we saw in this lab, by varying the seed for generating the random initial parameters, we obtained quite different results. Recent work points to several protocols for selecting better performing initial parameters (see for instance the work of Sureshbabu et al.).
Optimizer: Throughout this series of labs we defaulted to the COBYLA optimizer for the optimizer loop in the algorithm. There are several other non-gradient and gradient-based optimizers available in CUDA-Q for you to try, and you can use your favorite sci.py optimizer as well. See the full list of built-in CUDA-Q optimizers here.
QAOA kernel: We used a standard QAOA kernel throughout this tutorial. There are several adaptations that could be made to the QAOA kernel. For example, one could replace the QAOA kernel with a similarly structured multi-angle QAOA kernel. The difference between the QAOA kernel and the multi-angle QAOA kernel is that instead of each layer having distinct parameters for the problem and the mixer kernels, each parameterized rotational gate would have its own distinct parameter. Another variation of the QAOA kernel is the ADAPT-QAOA algorithm which iteratively constructs the QAOA circuit, tailoring it for each problem (Zhu et al.). You can learn more about ADAPT-QAOA and how to implement it in CUDA-Q here. Taking this one step further, researchers have used ADAPT-QAOA circuits to train a QAOA-GPT model that generates compact quantum circuits for the max cut problem (Tyagin et al.).
Alternatives to QAOA: Other completely different variational circuit structures could replace the QAOA kernel in our algorithm. Lee et al. compared several options and concluded that noncommunativity and entanglement in the variational kernel were important for improving algorithm performance. There are also different options of encoding the graph structure into the quantum circuit. For instance, instead of each vertex being represented by a single qubit, it is possible to encode a graph with half the number of qubits using techniques like multi-basis encoding.
Parallelization: Finally, our choice of how to distribute the subgraph problems onto the GPU was quite arbitrary, and some GPU processes sat idle while others were finishing their tasks. By first analyzing the subgraph partition, we might find a better distribution of tasks to the GPU to avoid down time. Also, our parallelization protocol used one GPU to simulate multiple QPUs. It is possible to assign multiple GPUs to simulate one QPU, which would be necessary for larger circuits. To learn more about this option, check out the documentation on the
remote-mqpu
platform. Additionally, CUDA Quantum offers anexecution()
option forobserve
and asynchronous primitivessample_async
andobserve_async
. You can read more about these functions here.
Next
Congratulations on solving max cut problems with the divide-and-conquer QAOA! It's now time to see if you can apply what you have learned to a weighted graph in the next notebook. Good luck!