Path: blob/master/site/en-snapshot/quantum/tutorials/barren_plateaus.ipynb
25118 views
Copyright 2020 The TensorFlow Authors.
Barren plateaus
In this example you will explore the result of McClean, 2019 that says not just any quantum neural network structure will do well when it comes to learning. In particular you will see that a certain large family of random quantum circuits do not serve as good quantum neural networks, because they have gradients that vanish almost everywhere. In this example you won't be training any models for a specific learning problem, but instead focusing on the simpler problem of understanding the behaviors of gradients.
Setup
Install TensorFlow Quantum:
Now import TensorFlow and the module dependencies:
1. Summary
Random quantum circuits with many blocks that look like this ( is a random Pauli rotation):
Where if is defined as the expectation value w.r.t. for any qubits and , then there is a problem that has a mean very close to 0 and does not vary much. You will see this below:
2. Generating random circuits
The construction from the paper is straightforward to follow. The following implements a simple function that generates a random quantum circuit—sometimes referred to as a quantum neural network (QNN)—with the given depth on a set of qubits:
The authors investigate the gradient of a single parameter . Let's follow along by placing a sympy.Symbol
in the circuit where would be. Since the authors do not analyze the statistics for any other symbols in the circuit, let's replace them with random values now instead of later.
3. Running the circuits
Generate a few of these circuits along with an observable to test the claim that the gradients don't vary much. First, generate a batch of random circuits. Choose a random ZZ observable and batch calculate the gradients and variance using TensorFlow Quantum.
3.1 Batch variance computation
Let's write a helper function that computes the variance of the gradient of a given observable over a batch of circuits:
3.1 Set up and run
Choose the number of random circuits to generate along with their depth and the amount of qubits they should act on. Then plot the results.
This plot shows that for quantum machine learning problems, you can't simply guess a random QNN ansatz and hope for the best. Some structure must be present in the model circuit in order for gradients to vary to the point where learning can happen.
4. Heuristics
An interesting heuristic by Grant, 2019 allows one to start very close to random, but not quite. Using the same circuits as McClean et al., the authors propose a different initialization technique for the classical control parameters to avoid barren plateaus. The initialization technique starts some layers with totally random control parameters—but, in the layers immediately following, choose parameters such that the initial transformation made by the first few layers is undone. The authors call this an identity block.
The advantage of this heuristic is that by changing just a single parameter, all other blocks outside of the current block will remain the identity—and the gradient signal comes through much stronger than before. This allows the user to pick and choose which variables and blocks to modify to get a strong gradient signal. This heuristic does not prevent the user from falling in to a barren plateau during the training phase (and restricts a fully simultaneous update), it just guarantees that you can start outside of a plateau.
4.1 New QNN construction
Now construct a function to generate identity block QNNs. This implementation is slightly different than the one from the paper. For now, look at the behavior of the gradient of a single parameter so it is consistent with McClean et al, so some simplifications can be made.
To generate an identity block and train the model, generally you need and not . Initially and are the same angles but they are learned independently. Otherwise, you will always get the identity even after training. The choice for the number of identity blocks is empirical. The deeper the block, the smaller the variance in the middle of the block. But at the start and end of the block, the variance of the parameter gradients should be large.
4.2 Comparison
Here you can see that the heuristic does help to keep the variance of the gradient from vanishing as quickly:
This is a great improvement in getting stronger gradient signals from (near) random QNNs.