Path: blob/master/examples/Stock_NeurIPS2018_2_Train.ipynb
726 views
Stock NeurIPS2018 Part 2. Train
This series is a reproduction of the process in the paper Practical Deep Reinforcement Learning Approach for Stock Trading.
This is the second part of the NeurIPS2018 series, introducing how to use FinRL to make data into the gym form environment, and train DRL agents on it.
Other demos can be found at the repo of FinRL-Tutorials.
Part 1. Install Packages
Part 2. Build A Market Environment in OpenAI Gym-style
The core element in reinforcement learning are agent and environment. You can understand RL as the following process:
The agent is active in a world, which is the environment. It observe its current condition as a state, and is allowed to do certain actions. After the agent execute an action, it will arrive at a new state. At the same time, the environment will have feedback to the agent called reward, a numerical signal that tells how good or bad the new state is. As the figure above, agent and environment will keep doing this interaction.
The goal of agent is to get as much cumulative reward as possible. Reinforcement learning is the method that agent learns to improve its behavior and achieve that goal.
To achieve this in Python, we follow the OpenAI gym style to build the stock data into environment.
state-action-reward are specified as follows:
State s: The state space represents an agent's perception of the market environment. Just like a human trader analyzing various information, here our agent passively observes the price data and technical indicators based on the past data. It will learn by interacting with the market environment (usually by replaying historical data).
Action a: The action space includes allowed actions that an agent can take at each state. For example, a ∈ {−1, 0, 1}, where −1, 0, 1 represent selling, holding, and buying. When an action operates multiple shares, a ∈{−k, ..., −1, 0, 1, ..., k}, e.g.. "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively
Reward function r(s, a, s′): Reward is an incentive for an agent to learn a better policy. For example, it can be the change of the portfolio value when taking a at state s and arriving at new state s', i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio values at state s′ and s, respectively
Market environment: 30 constituent stocks of Dow Jones Industrial Average (DJIA) index. Accessed at the starting date of the testing period.
Read data
We first read the .csv file of our training data into dataframe.
Construct the environment
Calculate and specify the parameters we need for constructing the environment.
Environment for training
Part 3: Train DRL Agents
Here, the DRL algorithms are from Stable Baselines 3. It's a library that implemented popular DRL algorithms using pytorch, succeeding to its old version: Stable Baselines.
Agent Training: 5 algorithms (A2C, DDPG, PPO, TD3, SAC)
Agent 1: A2C
Agent 2: DDPG
Agent 3: PPO
Agent 4: TD3
Agent 5: SAC
Save the trained agent
Trained agents should have already been saved in the "trained_models" drectory after you run the code blocks above.
For Colab users, the zip files should be at "./trained_models" or "/content/trained_models".
For users running on your local environment, the zip files should be at "./trained_models".