Path: blob/main/notebooks/quantum-machine-learning/vqc.ipynb
3855 views
Variational classification
In this page, we introduce variational algorithms, then describe and implement variational quantum classifier and discuss variational training.
Variational algorithms
Variational algorithms were introduced in 2014, with the variational eigensolver in Reference 1 and the quantum approximate optimization algorithm in Reference 2. They are near-term algorithms, that can be executed on current quantum computers in concert with classical computers.
Using a parameterized quantum circuit, or ansatz, , we prepare a state , and measure the expectation value using a quantum computer. We define a cost function , that determines how good is for the problem we are trying to solve. We then use a classical computer to calculate the cost function and provide updated circuit parameters using an optimization algorithm. The goal of the algorithm is to find the circuit parameters for the parameterized quantum circuit that minimizes the cost function .
The variational quantum classifier
The variational quantum classifier is a variational algorithm where the measured expectation value is interpreted as the output of a classifier, introduced by multiple groups in 2018. For a binary classification problem, with input data vectors and binary output labels ; for each input data vector, we build a parameterized quantum circuit that outputs the quantum state:
ParseError: KaTeX parse error: Undefined control sequence: \cssId at position 1: \̲c̲s̲s̲I̲d̲{_eq_1}{|\psi(\…where corresponds to the variational circuit unitary and corresponds to the data encoding circuit unitary. After creating and measuring the circuit of qubits, we're left with a length bitstring from which we must derive the binary output which will be our classification result. This is done with the help of a Boolean function ParseError: KaTeX parse error: Undefined control sequence: \cssId at position 2: \̲c̲s̲s̲I̲d̲{_map_1}{ f: \{…. The parity function is a popular choice for this.
In the training phase, we're trying to find the values for that give us the best predictions. The classical computer compares the predicted labels , to the provided labels , and we calculate the success of our predictions using a cost function. Based on this cost, the classical computer chooses another value for using a classical optimization algorithm. This new is then used to run a new circuit, and the process is repeated until the cost function stabilizes.
Full implementation
Let's implement all the separate components of the variational quantum classifier, and classify the adhoc dataset, as described in Reference 3, following this implementation from Rodney Osodo.
We create 20 training data points and 5 testing data points of 2 features from each class.
We prepare the classification circuit, using the Qiskit
ZZFeatureMapas the data encoding circuit, and the QiskitTwoLocalcircuit with and rotations and controlled-phase gates, as the variational circuit, as per Reference 3.
We create a function that associates the data to the feature map and the variational parameters to the variational circuit. This is to ensure that the right parameters in the circuit are associated with the right quantities.
We create a class assignment function to calculate the parity of the given bitstring. If the parity is even, it returns a label, and if the parity is odd it returns a label, as per Reference 3.
We create a function that returns the probability distribution over the label classes, given experimental counts from running the quantum circuit multiple times.
We create a function that classifies our data. It takes in data and parameters. For every data point in the dataset, we assign the parameters to the feature map and the parameters to the variational circuit. We then evolve our system and store the quantum circuit, so as to run the circuits at once at the end. We measure each circuit and return the probabilities based on the bit string and class labels.
For training, we create the loss and cost functions.
We set up our classical optimizer, using
SPSA(as per Reference 3), initialize our variational circuit parameters for reproducibility, and optimize our cost function modifying the variational circuit parameters, using our 40 training data points. Note that the optimization will take a while to run.
Plotting the cost function with respect to optimization step, we can see it starts to converge to a minimum.
We implement a function to score our variational quantum classifier, using the classification function we created earlier, and use it to test our trained classifier on our 10 test data points.
We see that the performance of the trained classifier is not great on the test data. The training optimization probably needed more time to train, or found a local minimum, rather than the global minimum.
Qiskit implementation
Qiskit has an implementation of the variational quantum classifier in the VQC class. Let's use it on the same dataset.
First, we need to one hot encode our labels, as required by the algorithm.
Next, we set up and run the VQC algorithm, setting initial variational circuit parameters for reproducibility and using the callback function we created earlier to plot the results, then plot the results.
Third, we test the trained classifier on the test data.
We see that the performance of the trained classifier is pretty good on the test data. The training optimization probably found the global minimum.
Variational training
As with all variational algorithms, finding the optimal parameters of the variational circuit takes most of the processing time, and is dependent on the optimization method used, as discussed in the training page.
Our optimal circuit parameters, are found when we find the minimum of the loss function, . However, there isn't a simple relationship between the loss function and the circuit parameters.
In fact, the loss landscape can be quite complicated, as shown in hills and valleys of the example below. The optimization method navigates us around the loss landscape, searching for the minimum, as shown by the black points and lines. we can see that two of the three searches end up in a local landscape minimum, rather than a global one.

Generally the optimization methods can be categorised into two groups: gradient-based and gradient-free methods. To determine an optimal solution, gradient-based methods identify an extreme point at which the gradient is equal to zero. A search direction is selected and the searching direction is determined by the derivative of the loss function. The main disadvantages of this type of optimization are the convergence speed can be very slow and there is no guarantee to achieve the optimal solution.
When derivative information is unavailable or impractical to obtain (e.g. when the loss function is expensive to evaluate or somewhat noisy), gradient-free methods can be very useful. Such optimisation techniques are robust to find the global optima, while the gradient-based methods tend to converge into local optima. However, gradient-free methods require higher computational capacities, especially for the problems with high-dimensional search spaces.

Despite what type of optimization method is used, if the loss landscape is fairly flat, it can be difficult for the method to determine which direction to search. This situation is called a barren plateau, and was studied in Reference 4. For a wide class of reasonable parameterized quantum circuits, the probability that the gradient along any reasonable direction is non-zero to some fixed precision is exponentially small as a function of the number of qubits.
One approach to overcome this problem is to use structured initial guesses, such as those adopted in quantum simulation. Another possibility is to consider the full quantum circuit as a sequence of shallow blocks, selecting some parameters randomly and choosing the rest of the parameters such that all shallow blocks implement the identity to restrict the effective depth. This is an area of current investigation.
References
Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man-Hong Yung, Xiao-Qi Zhou, Peter J. Love, Alán Aspuru-Guzik and Jeremy L. O'Brien, A variational eigenvalue solver on a quantum processor, Nature Communications, 5:4213 (2014), doi.org:10.1038/ncomms5213, arXiv:1304.3061.
Edward Farhi, Jeffrey Goldstone and Sam Gutmann, A Quantum Approximate Optimization Algorithm (2014), arXiv:1411.4028.
Vojtech Havlicek, Antonio D. Córcoles, Kristan Temme, Aram W. Harrow, Abhinav Kandala, Jerry M. Chow and Jay M. Gambetta, Supervised learning with quantum enhanced feature spaces, Nature 567, 209-212 (2019), doi.org:10.1038/s41586-019-0980-2, arXiv:1804.11326.
Jarrod R. McClean, Sergio Boixo, Vadim N. Smelyanskiy, Ryan Babbush and Hartmut Neven, Barren plateaus in quantum neural network training landscapes, Nature Communications volume 9, Article number: 4812 (2018), doi.org:10.1038/s41467-018-07090-4 arXiv:1803.1117