Path: blob/main/notebooks/ch-applications/facial-expression-recognition.ipynb
3855 views
Quantum Facial Expression Recognition
Facial Expression Recognition (FER) is an extremely relevant task associated with human-computer interaction, with applications in predictive environments, content analysis, support for healthcare, behavioural description and many more. FER is a classification problem consisting in associating a face image with a category of expressions indicating anger, fear, surprise, sadness, happiness and so on. This is a challenging problem when solved via computer models, due to the heterogeneity of human faces and the variety of poses and background. The conventional approach to FER is based on three major steps: image preprocessing, feature extraction, and expression classification. We describe here how a supervised classifier for expressions can be implemented on a quantum computer. The first preliminary step of image pre-processing is a classical task producing the items in the dataset, while the feature extraction phase is a mapping of such pre-processed images into graphs. The classification step is the quantum part of our implementation, where the features are mapped into the amplitudes of the quantum state forming the input to the quantum circuit representing the image classifier.
Facial recognition and analysis are powerful tools, and come with ethical concerns, particularly surrounding bias, privacy, and consent. With this in mind, and considering the unproven speedup of quantum facial expression recognition, we have deemed this research low-risk. However, any future research—especially applications that may touch upon issues of privacy and consent—will need similar assessment. We must make sure to consider the potential positive and negative impacts of our software, while also being responsible to train software with varied datasets and avoid introducing human bias that has discriminatory impacts.
In our approach we take a subset of a dataset of pictures (for our experiments we have considered the FFHQ dataset [2]) and label the items in this subset. Then we encode the labelled instances and an unlabelled test instance into quantum states; these are then used as input to a quantum circuit in order to infer the test label. The circuit operates in a way that is similar to the nearest centroid technique [5] of classical machine learning, since the inferred label corresponds to the label of the closest item in terms of the Euclidean distance on a representation of the graphs [3]. The quantum classifier we describe below extends the work in [4]. As we will show later, the output obtained via the final measurement of this circuit can be seen as the minimization of a specific cost function expressing the distance of the unlabelled test from each class of labels, respectively. Thus, by interpreting the distance measure as a kernel, this model can be seen as a kernelized binary classifier.
The complexity (in terms of number of operations or gates) of the procedure we use to calculate the distance is linear in the number of qubits, while in general a classical procedure would use a number of instructions, which is linear in the number of features.
Import the dependencies
The task
We have to design a program that given image containing either an HAPPY (human) face or a SAD face, provides the correct emotion associated with such image.
The task can be divided in the following phases:
image preprocessing: we perform the facial landmark detection [1], a process that extracts the positions of 68 landmark points (you can see these points on the picture below, on an abstract face). The output of this process is the set of coordinates of deformed points.

feature extraction: we extract a list of features from this cloud of points. The feature will be the distances between any two couple of points.
classification: we categorize the feature array associated with our image as HAPPY or SAD.
The dataset and the image preprocessing
In order to complete this tutorial we need a set of pictures of faces. You can use your own photos, or you can use the ones attached in datasets/FFHQ folder. The folder contains photos from the FFHQ dataset [2], freely downloadable here.
Each picture is under Creative Commons BY 2.0, Creative Commons BY-NC 2.0, Public Domain Mark 1.0, Public Domain CC0 1.0, or U.S. Government Works license. All of these licenses allow free use, redistribution, and adaptation for non-commercial purposes.
The operation of image preprocessing, which consists only in the facial landmark detection, is performed classically and is not relevant to our purpose. FFHQ dataset already contains the 68 landmark point per picture.
Sub-folders datasets/FFHQ/happy and datasets/FFHQ/sad contains each 10 images and landmark points, in files named as xxxxx.png and xxxxx_landmarks.json. Since the dataset does not contain the the emotion associated with the pictures, the latter were labelled manually.
To load the dataset run the following command:
You can visualize an a picture and its landmark points using the following method:
An happy face:
A sad face:
From set of coordinates to feature vectors
The set of landmark points representing the face is encoded through an undirected weighted graph. Each graph has 68 vertices, one per landmark point. The edge between two vertices is labelled with the Euclidean distance within the coordinates associated with the vertices. It is possible to add as many edges we like to the graph: the more edges we add, the richer (and possibly redundant) our graph is and the more expensive its encoding in the quantum circuit. We suggest two encodings:
complete graph encoding: each pair of vertices has an edge between them;
chordal graph encoding: the graph is chordal and built with the Delaunay triangulation algorithm, which assures that each vertex has at least two edges and every induced cycle in the graph has exactly three vertices.
Now that we have a graph we can construct its adjacency matrix. Since this matrix is symmetric and its diagonal is zero, we consider only the upper triangular part without the diagonal.
The Classifier
The classifier receives in input the vector and two vectors which are representatives of the happy and sad faces, respectively. We first compute the Euclidean distances and . With these distances, we can classify the test item as the closest representative, i.e. the one whose distance from the test item is smaller than from the other representative.
This extremely simple classifier depends heavily on the choice of the two representatives. However, we will see that this approach gives satisfactory results. Moreover, the accuracy of this classifier can be improved by calling the classifier function multiple times, with different representatives, and classify the test instances with the most frequently occurring label.
Classical calculation of the distance function
In order to compare the quantum clasifier with its calssical counterpart, we calculate the distance classically and return the inferred label.
Quantum calculation of the distance: auxiliary functions
It is convenient to define two auxiliary functions manipulating arrays.
The first function is used to add zero's to our array so as to fit the size of the quantum register that will contain our data. Any quantum register of qubits contains exactly coefficients. This means that an array of elements requires at least qubits where the last coefficients are zero. Given some array and the number of qubits, the zero_padding function adds the necessary zero coefficients and returns a new, zero-padded array.
Any quantum register containing an array must have norm one. The normalize method normalizes the array.
Finally, we need a procedure that encodes our data. To this purpose we use the Initialize gate. Note that Initialize consists of a row of non-linear Reset gates, one per each qubit, resetting the qubits to , followed by the unitary transformation that encodes our data. Initialize can be very inefficient and the speed of the algorithm is dependent on it being efficient.
To implement the quantum classifier circuit we will need the controlled version of Initialize gates. It can be calculated automatically by calling the control method (the only parameter is the desired number of control qubits). However, control only works with linear circuits so we need to find a way to remove the leading Reset gates.
The simplest way currently available is:
create the
Initializegate;call the
gates_to_uncomputemethod that invert the circuit and also removes theResetgates;call the
invertmethod to invert again, we have now the initialization without resets;call
controlmethod.
Quantum calculation of the distance: the circuit
The distance function calculation is obtained by the circuit below.
The imput to the circuit is composed by four registers:
the first, single qubit register is called auxiliary register;
the second register is the index register;
the third register is the data register;
the fourth register is class register.

Given a feature vector representing the graph, we can build the quantum state where is the normalization factor.
The circuit evolves as follows:
the circuit starts in state:
after the first two Hadamard gates the state is: so that both the auxiliary and the index registers are in an uniform superposition;
after the controlled initialization of the test instance the state is:
after the X operation on the auxiliary qubit the state is: the data is now bond with auxiliary , and this will allow us to interfere this data with and later;
after the double-controlled initialization of the first representative the state is:
after the X operation on the index register the state is: the data is now bond with auxiliary and index ,
after the double-controlled initialization of the second representative the state is: the data is now bond with auxiliary and index ,
after the CNOT gate binding the index register with the class register, the state is: for the sake of visualization, we set as and as to remind us that such register contains the two labels: which is re-arranged into the following equation: this operation bonds index with class and index with class ;
after the final Hadamard gate the state is: this operation interfere the data with , allowing us to calculate the distances.
Quantum calculation of the distance: the circuit
We now measure the auxiliary qubit projecting it to the zero state. The circuit will be then in state: For simmetry reasons, we can equivalently consider the results having auxiliary bit set to one and slightly modify the above formula.
By estimating the probability of reading class or we can estimate the distance between with respect to the distance between . So, To mitigate the errors due to the probabilistic nature of the computation, we can introduce a small tolerance around the boundary between the two distances:
It is simple to see that, once the data is loaded, the part of circuit that calculates the distance has a constant number of operations and therefore its complexity is , while any classical algorithm needs steps, where is the number of components of the vectors.
Image Classification
We can use the above classifier to solve the FER problem ([3]).
For the sake of simplicity, we consider only a subset of the points of the faces. This allows us to reduce the amount of information encoded within the quantum circuit. In particular, we can consider only the points related to the mouth under the assumption that a smily mouth is related to happyness emotion and a sulky mouth is related to sadness emotion.
It is interesting to study the accuracy changes when we further limit the number of points of the mouth considered for the classification. For tis reason we define the pick_mouth_points function which randomly generates a subset of vertices within 48 and 68, without repetitions.
Finally the method faces_classifier receives ad input the three faces happy_instance, sad_instance, test_instance which are list of coordinates (loaded from the dataset using load_landmarks) and some configuration parameters:
is_quantumis true forquantum_distanceand false forclassical_distance;is_complete_graphis true forbuild_complete_graphtechnique and false forbuild_complete_graph;n_pointsmust be an integer between 1 and 20 representing how many randomly chosen points of the mouth consider for the classification.
The output is either the string HAPPY or SAD.
In order to check the accuracy we can partition the dataset in training set and testing set:
Finally we calculate the accuracy:
Clearly, the accuracy for both the quantum and the classical case heavily depends on the choice of the two representatives. The accuracy might be improved by classifying the test instance by calling the faces classifier multiple times, with different representatives, and assigning the most frequently occurring label.
Finally, run the classification multiple times per item:
References
[1] Facial Landmark Detection, https://paperswithcode.com/task/facial-landmark-detection.
[2] A Style-Based Generator Architecture for Generative Adversarial Networks, Tero Karras (NVIDIA), Samuli Laine (NVIDIA), Timo Aila (NVIDIA), http://stylegan.xyz/paper.
[3] Facial Expression Recognition on a Quantum Computer (2021, DOI 10.1007/s42484-020-00035-5), R. Mengoni, M. Incudini, A. Di Pierro.
[4] Implementing a distance-based classifier with a quantum interference circuit (2017, DOI 10.1209/0295-5075/119/60002), M. Schuld, M. Fingerhuth, F. Petruccione.