Path: blob/master/examples/Stock_NeurIPS2018_SB3.ipynb
726 views
Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading
Pytorch Version
Content
We train a DRL agent for stock trading. This task is modeled as a Markov Decision Process (MDP), and the objective function is maximizing (expected) cumulative return.
We specify the state-action-reward as follows:
State s: The state space represents an agent's perception of the market environment. Just like a human trader analyzing various information, here our agent passively observes many features and learns by interacting with the market environment (usually by replaying historical data).
Action a: The action space includes allowed actions that an agent can take at each state. For example, a ∈ {−1, 0, 1}, where −1, 0, 1 represent selling, holding, and buying. When an action operates multiple shares, a ∈{−k, ..., −1, 0, 1, ..., k}, e.g.. "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively
Reward function r(s, a, s′): Reward is an incentive for an agent to learn a better policy. For example, it can be the change of the portfolio value when taking a at state s and arriving at new state s', i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio values at state s′ and s, respectively
Market environment: 30 consituent stocks of Dow Jones Industrial Average (DJIA) index. Accessed at the starting date of the testing period.
The data for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.
Requirement already satisfied: swig in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (4.3.0)
Requirement already satisfied: wrds in /home/random/.local/lib/python3.12/site-packages (3.2.0)
Requirement already satisfied: numpy<1.27,>=1.26 in /home/random/.local/lib/python3.12/site-packages (from wrds) (1.26.4)
Requirement already satisfied: packaging<23.3 in /home/random/.local/lib/python3.12/site-packages (from wrds) (23.2)
Requirement already satisfied: pandas<2.3,>=2.2 in /home/random/.local/lib/python3.12/site-packages (from wrds) (2.2.3)
Requirement already satisfied: psycopg2-binary<2.10,>=2.9 in /home/random/.local/lib/python3.12/site-packages (from wrds) (2.9.10)
Requirement already satisfied: scipy<1.13,>=1.12 in /home/random/.local/lib/python3.12/site-packages (from wrds) (1.12.0)
Requirement already satisfied: sqlalchemy<2.1,>=2 in /home/random/.local/lib/python3.12/site-packages (from wrds) (2.0.36)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/random/.local/lib/python3.12/site-packages (from pandas<2.3,>=2.2->wrds) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from pandas<2.3,>=2.2->wrds) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /home/random/.local/lib/python3.12/site-packages (from pandas<2.3,>=2.2->wrds) (2024.2)
Requirement already satisfied: typing-extensions>=4.6.0 in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from sqlalchemy<2.1,>=2->wrds) (4.12.2)
Requirement already satisfied: greenlet!=0.4.17 in /home/random/.local/lib/python3.12/site-packages (from sqlalchemy<2.1,>=2->wrds) (3.1.1)
Requirement already satisfied: six>=1.5 in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas<2.3,>=2.2->wrds) (1.16.0)
Requirement already satisfied: pyportfolioopt in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (1.5.6)
Requirement already satisfied: cvxpy>=1.1.19 in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from pyportfolioopt) (1.6.0)
Requirement already satisfied: ecos<3.0.0,>=2.0.14 in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from pyportfolioopt) (2.0.14)
Requirement already satisfied: numpy>=1.26.0 in /home/random/.local/lib/python3.12/site-packages (from pyportfolioopt) (1.26.4)
Requirement already satisfied: pandas>=0.19 in /home/random/.local/lib/python3.12/site-packages (from pyportfolioopt) (2.2.3)
Requirement already satisfied: plotly<6.0.0,>=5.0.0 in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from pyportfolioopt) (5.24.1)
Requirement already satisfied: scipy>=1.3 in /home/random/.local/lib/python3.12/site-packages (from pyportfolioopt) (1.12.0)
Requirement already satisfied: osqp>=0.6.2 in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from cvxpy>=1.1.19->pyportfolioopt) (0.6.7.post3)
Requirement already satisfied: clarabel>=0.5.0 in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from cvxpy>=1.1.19->pyportfolioopt) (0.9.0)
Requirement already satisfied: scs>=3.2.4.post1 in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from cvxpy>=1.1.19->pyportfolioopt) (3.2.7)
Requirement already satisfied: python-dateutil>=2.8.2 in /home/random/.local/lib/python3.12/site-packages (from pandas>=0.19->pyportfolioopt) (2.9.0.post0)
Requirement already satisfied: pytz>=2020.1 in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from pandas>=0.19->pyportfolioopt) (2024.1)
Requirement already satisfied: tzdata>=2022.7 in /home/random/.local/lib/python3.12/site-packages (from pandas>=0.19->pyportfolioopt) (2024.2)
Requirement already satisfied: tenacity>=6.2.0 in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from plotly<6.0.0,>=5.0.0->pyportfolioopt) (9.0.0)
Requirement already satisfied: packaging in /home/random/.local/lib/python3.12/site-packages (from plotly<6.0.0,>=5.0.0->pyportfolioopt) (23.2)
Requirement already satisfied: qdldl in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from osqp>=0.6.2->cvxpy>=1.1.19->pyportfolioopt) (0.1.7.post4)
Requirement already satisfied: six>=1.5 in /home/random/anaconda3/envs/finrl/lib/python3.12/site-packages (from python-dateutil>=2.8.2->pandas>=0.19->pyportfolioopt) (1.16.0)
[sudo] password for random: cmake is already the newest version (3.28.3-1build7).
libopenmpi-dev is already the newest version (4.1.6-7ubuntu2).
python3-dev is already the newest version (3.12.3-0ubuntu2).
zlib1g-dev is already the newest version (1:1.3.dfsg-3.1ubuntu2.1).
libgl1-mesa-glx is already the newest version (23.0.4-0ubuntu1~22.04.1).
swig is already the newest version (4.2.0-2ubuntu1).
0 upgraded, 0 newly installed, 0 to remove and 39 not upgraded.
Collecting git+https://github.com/AI4Finance-Foundation/FinRL.git
Cloning https://github.com/AI4Finance-Foundation/FinRL.git to /tmp/pip-req-build-flt95p98
Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-Foundation/FinRL.git /tmp/pip-req-build-flt95p98
Resolved https://github.com/AI4Finance-Foundation/FinRL.git to commit ef471fcea1f3667442f5ecbf7b4c214610a5dd55
Installing build dependencies ... done
Getting requirements to build wheel ... done
Preparing metadata (pyproject.toml) ... done
Collecting elegantrl@ git+https://github.com/AI4Finance-Foundation/ElegantRL.git (from finrl==0.3.6)
Cloning https://github.com/AI4Finance-Foundation/ElegantRL.git to /tmp/pip-install-u43l6ss9/elegantrl_36782baa6d82461e89b600dda61820c8
Running command git clone --filter=blob:none --quiet https://github.com/AI4Finance-Foundation/ElegantRL.git /tmp/pip-install-u43l6ss9/elegantrl_36782baa6d82461e89b600dda61820c8
Resolved https://github.com/AI4Finance-Foundation/ElegantRL.git to commit 59d9a33e2b3ba2d77c052c2810bb61059736d88c
Preparing metadata (setup.py) ... done
Requirement already satisfied: alpaca-trade-api<4,>=3 in /home/random/.local/lib/python3.12/site-packages (from finrl==0.3.6) (3.2.0)
Collecting ccxt<4,>=3 (from finrl==0.3.6)
Using cached ccxt-3.1.60-py2.py3-none-any.whl.metadata (108 kB)
Requirement already satisfied: exchange-calendars<5,>=4 in /home/random/.local/lib/python3.12/site-packages (from finrl==0.3.6) (4.6)
Collecting jqdatasdk<2,>=1 (from finrl==0.3.6)
Using cached jqdatasdk-1.9.7-py3-none-any.whl.metadata (5.8 kB)
Collecting pyfolio<0.10,>=0.9 (from finrl==0.3.6)
Using cached pyfolio-0.9.2.tar.gz (91 kB)
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [18 lines of output]
/tmp/pip-install-u43l6ss9/pyfolio_f61a15f976d345b4a7050d0999ff9c7b/versioneer.py:468: SyntaxWarning: invalid escape sequence '\s'
LONG_VERSION_PY['git'] = '''
Traceback (most recent call last):
File "<string>", line 2, in <module>
File "<pip-setuptools-caller>", line 34, in <module>
File "/tmp/pip-install-u43l6ss9/pyfolio_f61a15f976d345b4a7050d0999ff9c7b/setup.py", line 71, in <module>
version=versioneer.get_version(),
^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-install-u43l6ss9/pyfolio_f61a15f976d345b4a7050d0999ff9c7b/versioneer.py", line 1407, in get_version
return get_versions()["version"]
^^^^^^^^^^^^^^
File "/tmp/pip-install-u43l6ss9/pyfolio_f61a15f976d345b4a7050d0999ff9c7b/versioneer.py", line 1341, in get_versions
cfg = get_config_from_root(root)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/pip-install-u43l6ss9/pyfolio_f61a15f976d345b4a7050d0999ff9c7b/versioneer.py", line 399, in get_config_from_root
parser = configparser.SafeConfigParser()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: module 'configparser' has no attribute 'SafeConfigParser'. Did you mean: 'RawConfigParser'?
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
× Encountered error while generating package metadata.
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Part 3. Download Data
Yahoo Finance provides stock data, financial news, financial reports, etc. Yahoo Finance is free.
FinRL uses a class YahooDownloader in FinRL-Meta to fetch data via Yahoo Finance API
Call Limit: Using the Public API (without authentication), you are limited to 2,000 requests per hour per IP (or up to a total of 48,000 requests a day).
class YahooDownloader: Retrieving daily stock data from Yahoo Finance API
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
/tmp/ipykernel_24692/1255811168.py in ?()
----> 1 df.sort_values(['date','tic'],ignore_index=True).head()
~/.local/lib/python3.12/site-packages/pandas/core/frame.py in ?(self, by, axis, ascending, inplace, kind, na_position, ignore_index, key)
7168 f"Length of ascending ({len(ascending)})" # type: ignore[arg-type]
7169 f" != length of by ({len(by)})"
7170 )
7171 if len(by) > 1:
-> 7172 keys = [self._get_label_or_level_values(x, axis=axis) for x in by]
7173
7174 # need to rewrap columns in Series to apply key function
7175 if key is not None:
~/.local/lib/python3.12/site-packages/pandas/core/generic.py in ?(self, key, axis)
1907 values = self.xs(key, axis=other_axes[0])._values
1908 elif self._is_level_reference(key, axis=axis):
1909 values = self.axes[axis].get_level_values(key)._values
1910 else:
-> 1911 raise KeyError(key)
1912
1913 # Check for duplicates
1914 if values.ndim > 1:
KeyError: 'date'
Part 4: Preprocess Data
We need to check for missing data and do feature engineering to convert the data point into a state.
Adding technical indicators. In practical trading, various information needs to be taken into account, such as historical prices, current holding shares, technical indicators, etc. Here, we demonstrate two trend-following technical indicators: MACD and RSI.
Adding turbulence index. Risk-aversion reflects whether an investor prefers to protect the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the turbulence index that measures extreme fluctuation of asset price.
Part 5. Build A Market Environment in OpenAI Gym-style
The training process involves observing stock price change, taking an action and reward's calculation. By interacting with the market environment, the agent will eventually derive a trading strategy that may maximize (expected) rewards.
Our market environment, based on OpenAI Gym, simulates stock markets with historical market data.
Data Split
We split the data into training set and testing set as follows:
Training data period: 2009-01-01 to 2020-07-01
Trading data period: 2020-07-01 to 2021-10-31
Environment for Training
Part 6: Train DRL Agents
The DRL algorithms are from Stable Baselines 3. Users are also encouraged to try ElegantRL and Ray RLlib.
FinRL includes fine-tuned standard DRL algorithms, such as DQN, DDPG, Multi-Agent DDPG, PPO, SAC, A2C and TD3. We also allow users to design their own DRL algorithms by adapting these DRL algorithms.
Agent Training: 5 algorithms (A2C, DDPG, PPO, TD3, SAC)
Agent 1: A2C
Agent 2: DDPG
Agent 3: PPO
Agent 4: TD3
Agent 5: SAC
In-sample Performance
Assume that the initial capital is $1,000,000.
Set turbulence threshold
Set the turbulence threshold to be greater than the maximum of insample turbulence data. If current turbulence index is greater than the threshold, then we assume that the current market is volatile
Trading (Out-of-sample Performance)
We update periodically in order to take full advantage of the data, e.g., retrain quarterly, monthly or weekly. We also tune the parameters along the way, in this notebook we use the in-sample data from 2009-01 to 2020-07 to tune the parameters once, so there is some alpha decay here as the length of trade date extends.
Numerous hyperparameters – e.g. the learning rate, the total number of samples to train on – influence the learning process and are usually determined by testing some variations.
Mean Variance optimization is a very classic strategy in portfolio management. Here, we go through the whole process to do the mean variance optimization and add it as a baseline to compare.
First, process dataframe to the form for MVO weight calculation.
Helper functions for mean returns and variance-covariance matrix
Calculate the weights for mean-variance
Use PyPortfolioOpt
Part 7: Backtesting Results
Backtesting plays a key role in evaluating the performance of a trading strategy. Automated backtesting tool is preferred because it reduces the human error. We usually use the Quantopian pyfolio package to backtest our trading strategies. It is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy.