Path: blob/main/qaoa-for-max-cut/02_One-level-divide-and-conquer-QAOA.ipynb
579 views
Approximate Max-Cut using a Divide-and-Conquer Implementation of QAOA
Lab 2: Illustrating one level of the Divide-and-Conquer QAOA
2.1 Introduction
Having seen a preview of the divide-and-conquer QAOA in Lab 1, we are ready to dive into the details of the algorithm here. In this lab, we will introduce the main steps of the divide-and-conquer QAOA algorithm through an example graph that requires only one divide-and-conquer iteration. This will prepare us for Lab 3 in which we will iterate these steps recursively and in parallel to handle much larger problem instances.
Relationship to circuit cutting: From one point of view, the divide-and-conquer QAOA divides a graph into smaller graphs, and the max cut problems on those subgraphs are solved. Later these subgraph solutions are merged together for a max cut approximation of the parent graph. We see this viewpoint in work by Esposito and Danzig, Guerreschi, Li et al., Liers et al., Ponce et al., Yue et al., and Zhao et al. Another way of positioning the divide-and-conquer QAOA is as a circuit cutting protocol. Circuit cutting is a technique of dividing a quantum circuit into subcircuits with fewer gates and qubits. After running these circuits, post-processing stitches the results together to get an approximation of the execution of the original circuit. This is the viewpoint presented in the work of Cattelan and Yarkoni and of Bechtold et al. As you complete the lab, we encourage you to think about how these two viewpoints are related.
In this lab, we implement the divide-and-conquer QAOA in three steps:
Divide: The divide-and-conquer algorithm divides the initial graph into small enough subgraphs, recursively if necessary.
Conquer: QAOA is executed to find max-cut approximations of the smaller subgraphs, in parallel when possible.
Merge: To stitch together the subgraph solutions into a solution of their parent graph, we consider the border subgraph generated by the edges between the subgraphs. We define a new graph based on a different vertex set, which we will call the merger graph. This graph captures the interactions between border vertices. Finally, to merge the subgraph solutions together for an approximately optimal solution to the larger graph, we employ QAOA to the merger graph invoking a new cost function.
2.1.1 Overview of the notebook
This notebook is structured as follows:
2.2 Divide
Estimate the max cut value of a moderately sized example graph (called
sampleGraph2
in this notebook) using classical heuristicsPartition
sampleGraph2
into smaller subgraphs
2.3 Conquer
Estimate the max cut value of each of the subgraphs in the partition using a routine QAOA implementation with CUDA-Q's
vqe
function
2.4 Merge
Apply a brute-force computation to stitch together the subgraph solutions into an approximate max cut of
sampleGraph2
Define a new graph,
mergerGraph
, whose vertices represent each of the subgraphs in the partition and define a new cost function that codes the optimization problem for optimally stitching together subgraph solutions into a cut ofsampleGraph2
Apply QAOA to the
mergerGraph
optimization problem and use the result to merge the subgraph solutions into an approximate max cut solution of thesampleGraph2
2.5 Enable Distributed Quantum Computing
Write a Python script to execute the conquer stage of one level of the divide-and-conquer algorithm in parallel
Learning objectives:
Implement a one-level divide-and-conquer QAOA for Max Cut using CUDA-Q
Apply QAOA to find an approximate solution to Quadratic Unconstrained Binary Optimization problems
Select from different target devices to execute quantum circuits
Adapt the CUDA Quantum code to run in parallel on multiple GPUs or multiple QPUs
2.1.2 Reusable code from Lab 1
Before we begin, let's grab the relevant code from Lab 1 that we will repurpose in this lab. As you execute these cells, take a moment to read through them to recognize their purpose and logic.
2.2 Defining a graph
The goal of this lab is to use divide-and-conquer QAOA to find an approximate max cut of the graph defined and plotted below. We chose a moderately sized random graph for this demonstration. In Lab 3, we'll look at an even larger graph.
We will call this graph sampleGraph2
for the purposes of this exposition. This graph has so many vertices and edges that the circuit depth and width of the QAOA max cut circuit would likely be too large to run and/or simulate on most systems. In particular we would need 30 qubits for the max cut QAOA circuit of this graph. Therefore, sampleGraph2
is a good candidate to experiment with the divide-and-conquer QAOA. We'll see that by dividing this graph into smaller graphs, we will be able to replace a QAOA circuit requiring 30 qubits with a handful of circuits of 8 qubits or fewer!
For our initial experiment of the divide-and-conquer QAOA, it would be good to have a sense of the actual max cut solution of sampleGraph2
. Therefore, we chose sampleGraph2
to be small enough that we can use a classical greedy algorithm (NetworkX one_exchange
) to approximate the max cut solution. To read more about the one_exchange
algorithm, see the NetworkX documentation. Note that there are other classical heuristics that can handle much larger graphs and that there are exact polynomial-time algorithms for restricted graph classes (for a survey and comparison of these, check out this paper). However, the greedy algorithm is sufficient for our demonstration.
The greedy modularity maximization algorithm is a heuristic algorithm relying on a random choice of vertices to initiate the algorithm. By changing the random seed, we may get better (or worse) initial conditions that lead to better (or worse) approximate cuts. Let's consider the average max cut value produced by this algorithm by changing the seed.
Exercise Chose 20 different seeds and estimate the average approximate Max Cut value that the one_exchange
algorithm produces for sampleGraph2
. This will give us a baseline for comparison of the results that we'll generate with the divide-and-conquer QAOA in this lab. We aim to find a solution close to the classical approximation.
2.3 Divide
The first step of divide-and-conquer QAOA is to divide the graph into smaller, disjoint subgraphs until the subgraphs are small enough that their max cuts can be estimated through simulation or physical hardware executions of QAOA. The example in this lab is small enough that we only have to subdivide the initial graph once to generate a few subgraphs, each of which is sufficiently small to quickly estimate their max cut. Later on in Lab 3, we'll consider a larger graph that will require multiple subdivisions.
How should we divide our graph into smaller subgraphs? As we saw in Lab 1, the edges between the subgraphs can introduce some complications when merging together subgraph max cut solutions into an approximate solution of the parent graph. Therefore, the algorithm may run better if the number of edges between subgraphs is limited. In general, finding an optimal partitioning of this kind is thought to be NP-complete (Newman and Girvan, 2004). However, there are many heuristic algorithms that produce decent partitions for our purposes. For example, the Louvain algorithm was used in this paper and this one for quantum divide-and-conquer approaches to the Max Cut problem, and Angel Rodriquez-Fernandez and co-authors analyzed five other clustering algorithms to improve the Goemans-Williamson classical approximation for Max-Cut. For this tutorial, we will use a community-detection-based partition that was favorably compared to a random partition in the QAOA-in-QAOA paper. This community-detection-based partition was proposed by Clauset, Newman, and Moore and implemented in NetworkX with the function greedy_modularity_communities
. To read more about this classical algorithm, refer to the NetworkX documentation.
In the code block below we call the subpgraphPartition
function to divide our sampleGraph2
into smaller subgraphs. We plot the subgraphs below in different colors to visualize the subdivision.
Notice that each vertex of our original graph lies in exactly one subgraph of the partition defined by subgraph_dictionary
. However, edges that connect vertices in distinct subgraphs in subgraph_dictionary
are not drawn above. For instance there is an edge between vertex and in sampleGraph2
but, this edge does not appear in any of the subgraphs because the vertices and lie in separate subgraphs: G0
and G1
, respectively. We'll consider edges like this later in this notebook, during the merger stage.
Throughout this notebook, it will be helpful to identify the subgraph that contains a given vertex. Let's define a function to do this.
2.4 Conquer stage of the algorithm sequentially executed
In this section, we solve the max cut problem for each of the five subgraphs using QAOA. We first create a loop calling the vqe
function to compute the optimal parameters for the QAOA algorithm and the sample
function to find the optimal cut. Here, the vqe
and sample
functions are applied to each subgraph sequentially. In the section 2.6, we adapt the code to execute some of the computation in parallel.
Recall that in Lab 1, we learned how the vqe
function and the sample
primitive are used to implement QAOA and approximate the max cut of a graph. Here, we iterate this on each of the subgraphs in the subgraph_dictionary
to find the approximate max cut of each of the subgraphs.
Before we run this computation, let's first examine some of our options for circuit simulation. The code block below shows the backends available to us. We'll use the qpp-cpu
which runs on a CPU. If you have access to a GPU, you can switch to the nvidia
target, which provides a GPU-accelerated statevector simulator. By default the simulator uses FP32
floating point types. To switch to FP64
, you can reset the target to nvidia-fp64
. Later in the lab, we'll experiment with some of the other targets which can leverage multi-node, multi-GPU simulators.
Now that we've set the backend, we can find the approximate max cuts of the subgraphs using the functions that we defined in Lab 1.
** Exercise:** Edit the code block below replacing FIX_ME
with the function from Lab 1 that solves the max cut problem using vqe
and the sample
commands. For your reference, the functions from Lab 1 have been copied to the top of this notebook and reside in section 2.1.2. The names of these functions are:
hamiltonian_max_cut
kernel_qaoa
qaoa_for_graph
If you haven't yet done so, scroll back up to section 2.1.2 and execute the cells in that section so that, when you have correctly completed the cell block below, it will run without error. It may take a minute or two for the computation to complete.
Notice that in the previous exercise, the results for G1
could not be computed until we finished finding the approximate max cut of G0
. We could speed up the process by carrying out the QAOA computations for each of the disjoint subgraphs in parallel. We'll explore options for parallelization later in this lab and in Lab 3.
2.5 Merge
One way to merge together the graph colorings from the subgraph max cut solutions is to take their union (since the subgraphs are all disjoint, this will result in a legal 2-coloring of the orginial graph). However, this might not be optimal since there may be many edges between vertices in different subgraphs. Nonetheless it is a good starting point and we can visualize this solution below.
2.5.1 First attempt at merging the subgraphs solutions together
Next let's compute the cut value of sampleGraph2
given this coloring of the vertices. The cut will contain edges whose vertices have distinct colors. We need to count all of the edges in the cut.
Exercise: Edit the code block below to compute the cut value, if we used the colors (0,1) of the nodes of sampleGraph2
to define the cut.
The approximation of the max cut value of sampleGraph2
that we just found isn't that close to the max cut value found through classical methods at the start of this lab. Since we'll be referring to this cut later, we'll give it a name: unaltered cut. We can improve the unaltered cut!
In the next section, we will attempt to alter the unaltered cut of sampleGraph2
to find a cut with a higher cut value.
Before we move to the next section, feel free to go back to the code block at the end of section 2.4 to experiment with different initial parameters by changing the random seeds. Did the max cut approximations improve? You can also explore what will happen if you increase the layer_count
to 2, but only do this if you've got a several minutes to spare for the calculation to complete. If you're short on time, you can skip that experiment for now. There will be many more opportunities to experiment with layer_count
throughout this series of labs.
2.5.2 Constructing a better global solution from the subgraph solutions through a brute-force method
Recall, that in Lab 1 we saw that by changing the colorings (i.e. changing the 0s to 1s and vice versa) of the nodes of one of the subgraphs, we might improve the merged solution. In the Lab 1 example, we only had 2 subgraphs. Here we have many more subgraphs and can chose to flip the colors of all the vertices in some collection of the subgraphs. Let's see if changing one or more of the subgraph colors will improve our max cut estimate for sampleGraph2
.
Yay! We found a cut of the sampleGraph2
whose cut value is closer to the classical approximation that we found at the beginning of this lab. The down side, is that the brute-force procedure above scales terribly, requiring cycles through the loop in the code block above, where is the number of subgraphs in our partition. Fortunately, we can do better!
In the next section, we will consider the task above (of merging together subgraph solutions to find a good cut of the parent graph) as an optimization problem. We can use QAOA to solve this optimization problem.
2.5.2 Constructing a better global solution from the subgraph solutions using QAOA
Recall at the end of Lab 1, we took the subgraph solutions and merged them together by flipping the colors of one of the subgraphs. The graph on the left in the image below shows the unaltered cut from the subgraph solutions. The graph on the right depicts the new cut obtained from flipping the colors of the subgraph with vertices . By flipping these colors, we were able to add the edge to the cut, while still maintaining the same number of cut edges in each of the two subgraphs.
The example from Lab 1 indicates that the edges connecting subgraphs influence the decision of which subgraph colorings should be flipped to find a coloring better than our unaltered cut value. In sampleGraph2
we have multiple edges between the five different subgraphs. Changing the colors of one of the subgraphs to gain one edge, may remove other edges from the cut. We need a way to keep track of this.
To better understand this, let's isolate the important features in the analysis from Lab 1. We'll call the edges of sampleGraph2
that don't appear in any subgraph (i.e., the edges that connect distinct subgraphs) border edges and we'll look at the subgraph of sampleGraph2
generated by the border edges. We'll call this subgraph borderGraph.
We can also visualize the edges in the borderGraph that would make the cut if we merged together the colorings from the subgraph solutions without making any changes. We will highlight the edges from borderGraph
that are in the unaltered cut with the color green.
Our goal is to change the colors of the vertices in borderGraph
by selecting a few of the subgraphs and changing all the colors of all nodes in those selected subgraphs from gray to green and green to gray. We want to swap colors in such a way that we can add as many of the edges of borderGraph
into a cut of sampleGraph2
as possible without sacrificing too many edges from the unaltered cut. Notice that if we swap all the colors of a subgraph, the total number of edges in the unaltered cut within that subgraph remains unchanged. Therefore, when changing the colors of a subgraph we only have to balance the gain of adding some of the (unhighlighted in the graph above) edges from the borderGraph into the cut with the cost of losing (highlighted green in the graph above) borderGraph
edges from the unaltered cut.
This new optimization problem involves deciding whether or not to flip the colors of each subgraph. For each possible color change of subgraphs, we'll compute a "gain" from changing the colors. This gain is the number of border edges added to the cut from the alteration minus the number of border edges originally in the unaltered cut that are no longer in the cut after the color change. We want to maximize this gain. In the following paragraph, we'll translate this to an optimization problem on a new graph: mergerGraph
.
Since we have to decide whether or not to change the colors of a subgraph, we will have five variables, , one for each subgraph. We'll interpret as the decision not to flip the colors of subgraph Gi
, and to indicate that we will flip the colors of subgraph Gi
when we merge the subgraph solutions together. We can refer to an assignment of values to the variables as a swap schedule since they determine which subgraphs will have their vertex colorings fixed and which subgraphs will have their vertex colorings swapped.
Finding an optimal swap schedule depends on the edges in the borderGraph
. We can visualize this as a new graph, which we'll call the mergerGraph
. The vertices in this graph represent the subgraphs. We'll label the nodes G0
, G1
, G2
, G3
, and G4
to indicate their corresponding subgraph. Two nodes Gi
and Gj
in mergerGraph
will be connected if there is at least one edge in borderGraph
between a node in Gi
and a node in Gj
. We're losing a bit of information (e.g. the number of edges between two subgraphs) through this definition, but we'll recover that information shortly, when we add a weight to the edges of this graph. There is a lot going here! So, let's first take a look at the mergerGraph
without weights.
Notice that there is no edge between node G0
and node G2
in the mergerGraph
because in sampleGraph2
, no vertex in G0
shares an edge with any vertex in G2
.
Since we're collapsing all the vertices in a subgraph down to just one vertex in mergerGraph
, we lose information about the number of edges between subgraphs. More importantly we lose information about the number of edges between subgraphs that are in the unaltered cut and the number of edges that are not in that cut. This information is important in deciding whether or not to change the colors of the vertices in one of the subgraphs. We'll record this information as penalities of edges between the nodes in mergerGraph
. For every edge in borderGraph
connecting subgraph Gi
to Gj
, we'll add 1 to the penalty
of the edge (Gi,Gj)
in mergerGraph
if the edge was in the unaltered cut and we'll subtract 1 from the penalty
if the edge was not in the unaltered cut. The idea being that if we flip the colors of subgraph Gi
we may lose some edges and gain others in the new cut. A positive penalty would indicate that there are more edges between subgraphs Gi
and Gj
that would be removed from the cut than gained when we swap the colors of one of the subgraphs. So a negative penalty between vertices can be thought of as a gain of edges in the cut from flipping one of the subgraph colorings.
Before we compute the penalties for each edge in mergerGraph
, let's just consider the nodes G1
and G2
. We've plotted both of these graphs below with G1
in yellow and G2
in blue with the edges between the subgraphs highlighted in gray and green. The edges colored green are in the unaltered cut. There are edges connecting a vertex in G1
with one in G2
: , , , , and . One of two things has occured:
Either one of these edges is colored green (and hence in unaltered cut) and the other 4 are not. This means that if were to alter the coloring that determined the unaltered cut by flipping the colors of either the nodes in
G1
or the nodes inG2
we would lose one edge (the green one) but gain four edges in the new modified cut. The penalty for changing the color of one of these two subgraphs would be (i.e., we would have a net gain of edges).Or, one of the edges is colored gray and the other 4 are green. This means that if we were to alter the coloring that determined the unaltered cut by flipping the colors of either the nodes in
G1
or the nodes inG2
we would lose 4 edges (the green ones) but gain one edge in the new modified cut. The penalty for changing the color of one of these two subgraphs would be (i.e., we would have a net loss of edges).
Continuing to focus on subgraphs G1
and G2
, let's combine the penalty (which we'll denote ) term together with the variables and by considering the product . If we assigned and , then this product — which we can think of as the gain — would be . We can interpret this as a gain of edges if we swap colors of G1
but kept the colors of the nodes in G2
fixed. On the other hand, if we didn't change the colors of either subgraph, or if we changed the colors of both subgraphs, then the product would be negative, indicating we'd be worse off.
This allows us to frame the merger process as a minimization problem, and we'll be able to use QAOA to solve it. Thinking of just the subgraphs G1
and G2
, we would want to maximize the gain ; in other words we need to minimize the product . Therefore, our optimization problem is to minimize the penalties of all edges in mergerGraph
. This means we will want to minimize the function: where is the set of edges in the mergerGraph
.
Now that we know what we need to minimize, let's compute the penalties for all edges in mergerGraph
.
Exercise: Edit the code block below in the two FIX_ME
locations. Hint: you'll either be adding 1 or subtracting 1 from the running computation of the penalty (penalty_ij
) between node Gi
and Gj
.
Now that we have penalties assigned, our aim is to find an assignment of the variables so that is minimized. Very similarly to how we derived the Max Cut Hamiltonian in section 1.5 of Lab 1, we can define a Hamiltonian fpr which we'll refer to as the mergerHamiltonian
. This is done by replacing every instance of in with a Pauli- operator applied to the qubit associated with . We'll be reusing this in Lab 3, so let's write a function to create the mergerHamiltonian
.
We're now ready to apply QAOA to find a solution to the merger problem (i.e. find values for so that is minimized). This is very similar to what we did in Lab 1 to find the minimum of the Max Cut Hamiltonian.
Exercise: Edit the code block below, changing the FIX_ME
functions to a selection from the following. Two of these options will not be used.
cudaq.sample
cudaq.observe
cudaq.vqe
.expectation()
.most_probable()
To interpret the results, notice that a result such as most_probable = 10110
would indicate that we can improve the unaltered cut by flipping the colors of the vertices in subgraphs G0
, G2
, and G3
. Let's see how the new coloring determined by the results of the previous cell affects our max cut value of the sampleGraph2
.
Yay! We've found a better cut of sampleGraph2
than the unaltered cut! Feel free to experiment with higher layer counts and different seeds to see if you can achieve an even better approximation.
2.6 Enabling Distributed Quantum Computing
You may have noticed that it took a little bit of time to find the max cut approximation in the previous section. Here we'll investigate one way of speeding this up.
A programmer may have available one or many host CPUs, zero or many NVIDIA GPUs, and zero or many QPUs (see image below). CUDA-Q allows for easy porting between classical computers, simulated QPUs, and real QPUs. Additionally CUDA-Q provides many opportunities for parallelization of quantum circuit simulation and execution.
In the near-term, it is unlikely that the underlying platform may expose multiple QPUs. However, the availability of GPU-based circuit simulators on NVIDIA multi-GPU architectures does provide an opportunity to think about programming such a multi-QPU architecture now, and CUDA-Q's seamless transition from simulation to physical QPUs accommodates the changing quantum ecosystem. In this lab, we'll demonstrate how to execute the QAOA circuits for each subgraph simultaneously on multiple GPU processes using MPI, and we will demonstrate how this can be adapted to run on multiple QPUs when available.
Important: Before proceeding, you will need to switch to a runtime with access to a GPU. To switch to a GPU instance on qBraid Lab, use the Compute Manager to select a GPU-enabled instance and run your notebooks within the CUDA-Q environment. If you are running on Google Colab and switch to a GPU runtime, you'll need to reinstall CUDA-Q by commenting out the indicated code.
Let's run the cell below to verify that we have CUDA-Q set up to run on GPUs. It will print out the number of GPUs available. The NVIDIA multiple-QPU option of the target (nvidia
) provides a simulated QPU for every available NVIDIA GPU on the underlying system.
Instead of using one GPU to simulate one QPU, because our quantum kernel doesn't contain too many qubits, we can actually run several processes on our GPU in parallel, with each process simulating a QPU. We can simulate multiple QPUs on one machine by executing a python script using Message Passing Interface (MPI). Each QPU is simulated via a cuStateVec
simulator backend and the nvidia
target.
Let's create a python script to send max cut approximation tasks for each subgraph to separate processes on the available GPU(s). We will then execute this script with MPI. MPI is a parallel programming communication protocol. For this lab, it's enough to understand that MPI assigns names (referred to as ranks) to each of the available processes on the available GPUs. When we use the mpiexec
command, the script that we create is executed on all of the GPUs. To differentiate what gets computed on each of the GPUs we use the rank
variable.
For this lab, if you have 1 available GPU, you can distribute 4 separate processes on this GPU to run in parallel. We'll use the processor with rank 0 as our home base. This is where all of the graph data will be stored and we'll use this processor for computation too. Our script will carry out the following steps:
Step1: The GPU process with rank 0 will carry out the divide stage of the divide-and-conquer QAOA. In particular, it will generate the
sampleGraph2
and all its subgraphs. This QPU will assign subgraph problems to all of the GPU processes (including itself). It will then distribute the relevant subgraph data to the other GPU processes.Step 2: Each GPU process will find the approximate max cuts of the graphs that it was assigned, and then communicate those solutions back to the GPU process with rank 0.
Step 3: The GPU process with rank 0 will merge together all of the subgraph solutions to complete the merge stage of the divide-and-conquer algorithm.
Step 1
Our script will have an if rank==0:
and else:
structure to separate out the tasks that the GPU process with rank 0 will do differently from the other GPU processes. For the GPU process with rank 0, we'll copy over all the code that we've written earlier in this lab to create the sampleGraph2
and subdivide it into subgraphs stored in subgraph_dictionary
. Next, we need to decide GPU-process assignments for the subgraphs. We won't worry about doing this optimally in this tutorial. For now, we simply need to map each subgraph to a GPU process. Since we have 5 subgraphs and 4 GPU processes available (remember we're using GPU with rank 0 for computation too), one of the GPU processes will be tasked with solving max cut problems for two subgraphs.
Exercise: Edit the code block below to define a dictionary keys_on_qpu
that maps GPU ranks to subsets of subgraph names. For instance one mapping might be {0:['G0','G4'], 1:['G1'], 2:['G2'], 3:['G3']}
If you have not yet already done so, download the Example-02 files from the github repository and save them in your working directory. Take a look at the Example-02-step-1.py file to see how the GPU process with rank
0 carries out the subgraph division and communicates the relevant subgraph data to the remaining GPU processes. The variable assigned_subgraph_dictionary
(whose definition depends on the rank
variable) is introduced to hold only the portion of the subgraph_dictionary
that is needed by the GPU associated with the value of rank
.
Run the following cell to use MPI to execute this step and watch the output that reflects the actions that have been completed by the different processors.
Step 2
Now let's turn our attention to Step 2 of the parallelization. For this step, we need to add to the Example-02-step-1.py script instructions so that each of the GPUs will solve the max cut problems for the subgraphs assigned to it. We'll need to copy over all of the necessary functions that we've defined so far (e.g., vqe_optimize
, vqe_sample
, hamiltonian_max_cut
, kernel_qaoa
) replacing all of the subgraph_dictionary
instances with assigned_subgraph_dictionary
.
As you run the Example-02-step-2.py file with MPI, you might notice that there is some overhead in subdividing the graph and communicating that information to the GPUs. However, once the subgraph data is loaded onto the GPU processes, the max cut computations are carried out rather quickly.
Another thing to notice is that the max cut results for each of the subgraphs currently reside on the individual GPU processes, and we'll need to communicate that information back to the GPU process with rank 0 so that they can be merged together to get a cut of the entire sampleGraph2
. Let's see what is needed to carry out this communication.
Exercise: Carry out the following steps to move the subgraph solutions ot the first GPU and verify the action.
Complete the code block below
Copy over the completed code block to the bottom of the Example-02-step-2.py file
Save the file as Example-02-step-2-Solution.py
Execute the revised script with MPI to verify that the max cut solutions of the subgraphs all now reside on the GPU process with rank 0.
Yay! We have managed to create a parallel version of the conquer stage of the divide-and-conquer QAOA.
Step 3
All that is left is to merge together the subgraph solutions in an optimal way. We've added the necessary functions and tasks to the Example-02-step-3-Solution.py to complete this step.
You may have noticed that the max cut approximation that was returned from the Example-02-step-3-Solution.py script is different from the one that we computed earlier in this lab. This is because our random seeds in the python script were different than the ones in this notebook. Before moving on to the next lab, we encourage you to change some of the random seeds throughout the notebook to see that the divide-and-conquer QAOA applied to the same graph will give different results depending on the initial parameters. Moreover, sometimes the initial parameters that are randomly selected lead to a barren plateau and an NL Opt error breaks the execution. Increasing the number of shots or changing the maximum number of iterations of the optimizer may remedy this.
Another variable to experiment with is the layer_count
. Increasing the layer_count
generally leads to better solutions. However, this comes at the expense of increased circuit depth which may increase the runtime of the simulation (at best) or may prohibit the simulation from completing.
You can also experiment with changing the parameters (number of edges and number of vertices) that generated our random graph to get a sense of the limitations of the divide-and-conquer algorithm (as it is currently written in this notebook). In particular, random graphs with 100 nodes may generate subgraphs that are too large for the QAOA algorithm. Additionally, some conquer graphs may produce trivial QAOA instances where the the mergerHamiltonian
is null. We'll take care of those problems in the next lab.
Looking towards the future when multiple QPUs are available, the remote QPU nvidia-mqpu
platform could be leveraged to run this QAOA algorithm in parallel on multiple QPUs.
Next
The graph in this notebook was small enough that the divide stage produced only a few subgraphs, each of which was sufficiently small to easily identify the max cut using QAOA. Proceed to Lab 3 where we'll tackle larger graphs!