GitHub Repository: AI4Finance-Foundation/FinRL
Path: blob/master/examples/FinRL_GPM_Demo.ipynb
⁷²⁶ views

Kernel: Python 3 (ipykernel)

GPM: A graph convolutional network based reinforcement learning framework for portfolio management

In this document, we will make use of a graph neural network architecture called GPM, introduced in the following paper:

Si Shi, Jianjun Li, Guohui Li, Peng Pan, Qi Chen & Qing Sun. (2022). GPM: A graph convolutional network based reinforcement learning framework for portfolio management. https://doi.org/10.1016/j.neucom.2022.04.105.

Note

If you're using the portfolio optimization environment, consider citing the following paper (in adittion to FinRL references):

Caio Costa, & Anna Costa (2023). POE: A General Portfolio Optimization Environment for FinRL. In Anais do II Brazilian Workshop on Artificial Intelligence in Finance (pp. 132–143). SBC. https://doi.org/10.5753/bwaif.2023.231144.

@inproceedings{bwaif,
 author = {Caio Costa and Anna Costa},
 title = {POE: A General Portfolio Optimization Environment for FinRL},
 booktitle = {Anais do II Brazilian Workshop on Artificial Intelligence in Finance},
 location = {João Pessoa/PB},
 year = {2023},
 keywords = {},
 issn = {0000-0000},
 pages = {132--143},
 publisher = {SBC},
 address = {Porto Alegre, RS, Brasil},
 doi = {10.5753/bwaif.2023.231144},
 url = {https://sol.sbc.org.br/index.php/bwaif/article/view/24959}
}

Installation and imports

To run this notebook in google colab, uncomment the cells below.

In [1]:

## install finrl library
# !sudo apt install swig
# !pip install git+https://github.com/AI4Finance-Foundation/FinRL.git

In [2]:

## We also need to install quantstats, because the environment uses it to plot graphs
# !pip install quantstats

In [3]:

## Hide matplotlib warnings
# import warnings
# warnings.filterwarnings('ignore')

import logging
logging.getLogger('matplotlib.font_manager').disabled = True

Import the necessary code libraries

In [4]:

import torch

import numpy as np
import pandas as pd

from torch_geometric.utils import k_hop_subgraph

from finrl.meta.preprocessor.yahoodownloader import YahooDownloader
from finrl.meta.env_portfolio_optimization.env_portfolio_optimization import PortfolioOptimizationEnv
from finrl.agents.portfolio_optimization.models import DRLAgent
from finrl.agents.portfolio_optimization.architectures import GPM

device = "cuda:0" if torch.cuda.is_available() else "cpu"

Fetch data

We are going to use the same data used in the paper. The original data can be found in Temporal_Relational_Stock_Ranking repository, but it's not in a FinRL friendly format. So, we're going to get the processed and FinRL-friendly data from Temporal_Relational_Stock_Ranking_FinRL repository.

In [5]:

# download repository with data and extract tar.gz file with processed temporal data
!curl -L -O https://github.com/C4i0kun/Temporal_Relational_Stock_Ranking_FinRL/archive/refs/heads/main.zip
!unzip Temporal_Relational_Stock_Ranking_FinRL-main.zip
!mv Temporal_Relational_Stock_Ranking_FinRL-main Temporal_Relational_Stock_Ranking_FinRL
!tar -xzvf Temporal_Relational_Stock_Ranking_FinRL/temporal_data/temporal_data_processed.tar.gz -C Temporal_Relational_Stock_Ranking_FinRL/temporal_data

Out[5]:

NASDAQ_temporal_data.csv
NYSE_temporal_data.csv

In [6]:

nasdaq_temporal = pd.read_csv("Temporal_Relational_Stock_Ranking_FinRL/temporal_data/NASDAQ_temporal_data.csv")
nasdaq_temporal

Out[6]:

In [7]:

nasdaq_edge_index = np.load("Temporal_Relational_Stock_Ranking_FinRL/relational_data/edge_indexes/NASDAQ_sector_industry_edge_index.npy")
nasdaq_edge_index

Out[7]:

array([[   0,   15,    0, ..., 1021, 1014, 1024],
       [  15,    0,   18, ..., 1011, 1024, 1014]])

In [8]:

nasdaq_edge_type = np.load("Temporal_Relational_Stock_Ranking_FinRL/relational_data/edge_types/NASDAQ_sector_industry_edge_type.npy")
nasdaq_edge_type

Out[8]:

array([ 0,  0,  0, ...,  1, 26, 26])

Simplify Data

The graph loaded is too big, causing the training process to be extremely slow. So we are going to remove some of the stocks in the graph structure so that only stocks from 2 hops of the ones in our portfolio are considered.

In [9]:

list_of_stocks = nasdaq_temporal["tic"].unique().tolist()
tics_in_portfolio = ["AAPL", "CMCSA", "CSCO", "FB", "HBAN", "INTC", "MSFT", "MU", "NVDA", "QQQ", "XIV"]

portfolio_nodes = []
for tic in tics_in_portfolio:
    portfolio_nodes.append(list_of_stocks.index(tic))
portfolio_nodes

Out[9]:

[2, 185, 215, 310, 395, 464, 596, 603, 637, 768, 1014]

In [10]:

nodes_kept, new_edge_index, nodes_to_select, edge_mask = k_hop_subgraph(
    torch.LongTensor(portfolio_nodes),
    2,
    torch.from_numpy(nasdaq_edge_index),
    relabel_nodes=True,
)

In [11]:

# reduce temporal data
nodes_kept = nodes_kept.tolist()
nasdaq_temporal["tic_id"], _ = pd.factorize(nasdaq_temporal["tic"], sort=True)
nasdaq_temporal = nasdaq_temporal[nasdaq_temporal["tic_id"].isin(nodes_kept)]
nasdaq_temporal = nasdaq_temporal.drop(columns="tic_id")
nasdaq_temporal

Out[11]:

In [12]:

# reduce edge type
new_edge_type = torch.from_numpy(nasdaq_edge_type)[edge_mask]
_, new_edge_type = torch.unique(new_edge_type, return_inverse=True)
new_edge_type

Out[12]:

tensor([0, 0, 0,  ..., 2, 6, 6])

Instantiate Environment

Using the PortfolioOptimizationEnv, it's easy to instantiate a portfolio optimization environment for reinforcement learning agents. In the example below, we use the dataframe created before to start an environment.

In [13]:

df_portfolio = nasdaq_temporal[["day", "tic", "close", "high", "low"]]

df_portfolio_train = df_portfolio[df_portfolio["day"] < 979]
df_portfolio_test = df_portfolio[df_portfolio["day"] >= 979]

environment_train = PortfolioOptimizationEnv(
        df_portfolio_train,
        initial_amount=100000,
        comission_fee_pct=0.0025,
        time_window=50,
        features=["close", "high", "low"],
        time_column="day",
        normalize_df=None, # dataframe is already normalized
        tics_in_portfolio=tics_in_portfolio
    )

environment_test = PortfolioOptimizationEnv(
        df_portfolio_test,
        initial_amount=100000,
        comission_fee_pct=0.0025,
        time_window=50,
        features=["close", "high", "low"],
        time_column="day",
        normalize_df=None, # dataframe is already normalized
        tics_in_portfolio=tics_in_portfolio
    )

Instantiate Model

Now, we can instantiate the model using FinRL API. In this example, we are going to use the EI3 architecture introduced by Jiang et. al.

❗ Note: Remember to set the architecture's time_window parameter with the same value of the environment's time_window.

In [14]:

# set PolicyGradient parameters
model_kwargs = {
    "lr": 0.01,
    "policy": GPM,
}

# here, we can set GPM's parameters
policy_kwargs = {
    "edge_index": new_edge_index,
    "edge_type": new_edge_type,
    "nodes_to_select": nodes_to_select
}

model = DRLAgent(environment_train).get_model("pg", device, model_kwargs, policy_kwargs)

Train Model

We will train only a few episodes because training takes a considerable time.

In [15]:

DRLAgent.train_model(model, episodes=2)

Out[15]:

  0%|                                                     | 0/2 [00:00<?, ?it/s]

=================================
Initial portfolio value:100000
Final portfolio value: 191986.328125
Final accumulative portfolio value: 1.91986328125
Maximum DrawDown: -0.20472381491579683
Sharpe ratio: 5.4259683434925705
=================================

 50%|██████████████████████                      | 1/2 [10:18<10:18, 618.80s/it]

=================================
Initial portfolio value:100000
Final portfolio value: 292055.125
Final accumulative portfolio value: 2.92055125
Maximum DrawDown: -0.08291245970485872
Sharpe ratio: 9.58044714250322
=================================

100%|████████████████████████████████████████████| 2/2 [20:37<00:00, 618.95s/it]

<finrl.agents.portfolio_optimization.algorithms.PolicyGradient at 0x7f43395e0310>

Save Model

In [16]:

torch.save(model.train_policy.state_dict(), "policy_GPM.pt")

Test Model

Following the idea from the original article, we will evaluate the performance of the trained model in the test period. We will also compare with the Uniform buy and hold strategy.

Test GPM architecture

It's important no note that, in this code, we load the saved policy even though it's not necessary just to show how to save and load your model.

In [17]:

GPM_results = {
    "train": environment_train._asset_memory["final"],
    "test": {},
}

# instantiate an architecture with the same arguments used in training
# and load with load_state_dict.
policy = GPM(new_edge_index, new_edge_type, nodes_to_select, device=device)
policy.load_state_dict(torch.load("policy_GPM.pt"))

# testing
DRLAgent.DRL_validation(model, environment_test, policy=policy)
GPM_results["test"] = environment_test._asset_memory["final"]

Out[17]:

=================================
Initial portfolio value:100000
Final portfolio value: 143342.140625
Final accumulative portfolio value: 1.43342140625
Maximum DrawDown: -0.0031984947828825883
Sharpe ratio: 21.581835787662943
=================================

Test Uniform Buy and Hold

For comparison, we will also test the performance of a uniform buy and hold strategy. In this strategy, the portfolio has no remaining cash and the same percentage of money is allocated in each asset.

In [18]:

UBAH_results = {
    "train": {},
    "test": {},
}

PORTFOLIO_SIZE = len(tics_in_portfolio)

# train period
terminated = False
environment_train.reset()
while not terminated:
    action = [0] + [1/PORTFOLIO_SIZE] * PORTFOLIO_SIZE
    _, _, terminated, _ = environment_train.step(action)
UBAH_results["train"] = environment_train._asset_memory["final"]

# test period
terminated = False
environment_test.reset()
while not terminated:
    action = [0] + [1/PORTFOLIO_SIZE] * PORTFOLIO_SIZE
    _, _, terminated, _ = environment_test.step(action)
UBAH_results["test"] = environment_test._asset_memory["final"]

Out[18]:

=================================
Initial portfolio value:100000
Final portfolio value: 210066.515625
Final accumulative portfolio value: 2.10066515625
Maximum DrawDown: -0.1770357694310173
Sharpe ratio: 6.126976338415281
=================================
=================================
Initial portfolio value:100000
Final portfolio value: 140385.78125
Final accumulative portfolio value: 1.4038578125
Maximum DrawDown: -0.001439125798492591
Sharpe ratio: 23.930156872458472
=================================

Plot graphics

In [19]:

import matplotlib.pyplot as plt
%matplotlib inline 

plt.plot(UBAH_results["train"], label="Buy and Hold")
plt.plot(GPM_results["train"], label="GPM")

plt.xlabel("Days")
plt.ylabel("Portfolio Value")
plt.title("Performance in training period")
plt.legend()

plt.show()

Out[19]:

In [20]:

plt.plot(UBAH_results["test"], label="Buy and Hold")
plt.plot(GPM_results["test"], label="GPM")

plt.xlabel("Days")
plt.ylabel("Portfolio Value")
plt.title("Performance in testing period")
plt.legend()

plt.show()

Out[20]:

With only two training episodes, we can see that GPM achieves better performance than buy and hold strategy, but according to the original article, that performance could be better. Hyperparameter tuning must be performed. Additionaly, we used softmax temperature equal to one, something that can be changed to achieve better performance as stated in the original article.

In [ ]: