Path: blob/master/examples/timeseries/ipynb/timeseries_classification_transformer.ipynb
3236 views
Timeseries classification with a Transformer model
Author: Theodoros Ntakouris
Date created: 2021/06/25
Last modified: 2021/08/05
Description: This notebook demonstrates how to do timeseries classification using a Transformer model.
Introduction
This is the Transformer architecture from Attention Is All You Need, applied to timeseries instead of natural language.
This example requires TensorFlow 2.4 or higher.
Load the dataset
We are going to use the same dataset and preprocessing as the TimeSeries Classification from Scratch example.
Build the model
Our model processes a tensor of shape (batch size, sequence length, features)
, where sequence length
is the number of time steps and features
is each input timeseries.
You can replace your classification RNN layers with this one: the inputs are fully compatible!
We include residual connections, layer normalization, and dropout. The resulting layer can be stacked multiple times.
The projection layers are implemented through keras.layers.Conv1D
.
The main part of our model is now complete. We can stack multiple of those transformer_encoder
blocks and we can also proceed to add the final Multi-Layer Perceptron classification head. Apart from a stack of Dense
layers, we need to reduce the output tensor of the TransformerEncoder
part of our model down to a vector of features for each data point in the current batch. A common way to achieve this is to use a pooling layer. For this example, a GlobalAveragePooling1D
layer is sufficient.
Train and evaluate
Conclusions
In about 110-120 epochs (25s each on Colab), the model reaches a training accuracy of ~0.95, validation accuracy of ~84 and a testing accuracy of ~85, without hyperparameter tuning. And that is for a model with less than 100k parameters. Of course, parameter count and accuracy could be improved by a hyperparameter search and a more sophisticated learning rate schedule, or a different optimizer.