Path: blob/master/examples/FinRL_GPM_Demo.ipynb
726 views
GPM: A graph convolutional network based reinforcement learning framework for portfolio management
In this document, we will make use of a graph neural network architecture called GPM, introduced in the following paper:
Si Shi, Jianjun Li, Guohui Li, Peng Pan, Qi Chen & Qing Sun. (2022). GPM: A graph convolutional network based reinforcement learning framework for portfolio management. https://doi.org/10.1016/j.neucom.2022.04.105.
Note
If you're using the portfolio optimization environment, consider citing the following paper (in adittion to FinRL references):
Caio Costa, & Anna Costa (2023). POE: A General Portfolio Optimization Environment for FinRL. In Anais do II Brazilian Workshop on Artificial Intelligence in Finance (pp. 132–143). SBC. https://doi.org/10.5753/bwaif.2023.231144.
Installation and imports
To run this notebook in google colab, uncomment the cells below.
Import the necessary code libraries
Fetch data
We are going to use the same data used in the paper. The original data can be found in Temporal_Relational_Stock_Ranking repository, but it's not in a FinRL friendly format. So, we're going to get the processed and FinRL-friendly data from Temporal_Relational_Stock_Ranking_FinRL repository.
Simplify Data
The graph loaded is too big, causing the training process to be extremely slow. So we are going to remove some of the stocks in the graph structure so that only stocks from 2 hops of the ones in our portfolio are considered.
Instantiate Environment
Using the PortfolioOptimizationEnv
, it's easy to instantiate a portfolio optimization environment for reinforcement learning agents. In the example below, we use the dataframe created before to start an environment.
Instantiate Model
Now, we can instantiate the model using FinRL API. In this example, we are going to use the EI3 architecture introduced by Jiang et. al.
❗ Note: Remember to set the architecture's time_window
parameter with the same value of the environment's time_window
.
Train Model
We will train only a few episodes because training takes a considerable time.
Save Model
Test Model
Following the idea from the original article, we will evaluate the performance of the trained model in the test period. We will also compare with the Uniform buy and hold strategy.
Test GPM architecture
It's important no note that, in this code, we load the saved policy even though it's not necessary just to show how to save and load your model.
Test Uniform Buy and Hold
For comparison, we will also test the performance of a uniform buy and hold strategy. In this strategy, the portfolio has no remaining cash and the same percentage of money is allocated in each asset.
Plot graphics
With only two training episodes, we can see that GPM achieves better performance than buy and hold strategy, but according to the original article, that performance could be better. Hyperparameter tuning must be performed. Additionaly, we used softmax temperature equal to one, something that can be changed to achieve better performance as stated in the original article.