Path: blob/master/docs/source/tutorial/Introduction/MultipleStockTrading.rst
728 views
:github_url: https://github.com/AI4Finance-LLC/FinRL-Library Multiple Stock Trading =============================== Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading .. tip:: Run the code step by step at `Google Colab`_. .. _Google Colab: https://colab.research.google.com/github/AI4Finance-Foundation/FinRL/blob/master/FinRL_StockTrading_NeurIPS_2018.ipynb Step 1: Preparation --------------------------------------- **Step 1.1: Overview** To begin with, I would like explain the logic of multiple stock trading using Deep Reinforcement Learning. We use Dow 30 constituents as an example throughout this article, because those are the most popular stocks. A lot of people are terrified by the word “Deep Reinforcement Learning”, actually, you can just treat it as a “Smart AI” or “Smart Stock Trader” or “R2-D2 Trader” if you want, and just use it. Suppose that we have a well trained DRL agent “DRL Trader”, we want to use it to trade multiple stocks in our portfolio. - Assume we are at time t, at the end of day at time t, we will know the open-high-low-close price of the Dow 30 constituents stocks. We can use these information to calculate technical indicators such as MACD, RSI, CCI, ADX. In Reinforcement Learning we call these data or features as “states”. - We know that our portfolio value V(t) = balance (t) + dollar amount of the stocks (t). - We feed the states into our well trained DRL Trader, the trader will output a list of actions, the action for each stock is a value within [-1, 1], we can treat this value as the trading signal, 1 means a strong buy signal, -1 means a strong sell signal. - We calculate k = actions \*h_max, h_max is a predefined parameter that sets as the maximum amount of shares to trade. So we will have a list of shares to trade. - The dollar amount of shares = shares to trade* close price (t). - Update balance and shares. These dollar amount of shares are the money we need to trade at time t. The updated balance = balance (t) −amount of money we pay to buy shares +amount of money we receive to sell shares. The updated shares = shares held (t) −shares to sell +shares to buy. - So we take actions to trade based on the advice of our DRL Trader at the end of day at time t (time t’s close price equals time t+1’s open price). We hope that we will benefit from these actions by the end of day at time t+1. - Take a step to time t+1, at the end of day, we will know the close price at t+1, the dollar amount of the stocks (t+1)= sum(updated shares * close price (t+1)). The portfolio value V(t+1)=balance (t+1) + dollar amount of the stocks (t+1). - So the step reward by taking the actions from DRL Trader at time t to t+1 is r = v(t+1) − v(t). The reward can be positive or negative in the training stage. But of course, we need a positive reward in trading to say that our DRL Trader is effective. - Repeat this process until termination. Below are the logic chart of multiple stock trading and a made-up example for demonstration purpose: .. image:: ../../image/multiple_1.jpeg :scale: 60% .. image:: ../../image/multiple_2.png Multiple stock trading is different from single stock trading because as the number of stocks increase, the dimension of the data will increase, the state and action space in reinforcement learning will grow exponentially. So stability and reproducibility are very essential here. We introduce a DRL library FinRL that facilitates beginners to expose themselves to quantitative finance and to develop their own stock trading strategies. FinRL is characterized by its reproducibility, scalability, simplicity, applicability and extendibility. This article is focusing on one of the use cases in our paper: Mutiple Stock Trading. We use one Jupyter notebook to include all the necessary steps. .. image:: ../../image/FinRL-Architecture.png **Step 1.2: Problem Definition**: This problem is to design an automated solution for stock trading. We model the stock trading process as a Markov Decision Process (MDP). We then formulate our trading goal as a maximization problem. The algorithm is trained using Deep Reinforcement Learning (DRL) algorithms and the components of the reinforcement learning environment are: - Action: The action space describes the allowed actions that the agent interacts with the environment. Normally, a ∈ A includes three actions: a ∈ {−1, 0, 1}, where −1, 0, 1 represent selling, holding, and buying one stock. Also, an action can be carried upon multiple shares. We use an action space {−k, ..., −1, 0, 1, ..., k}, where k denotes the number of shares. For example, "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively - Reward function: r(s, a, s′) is the incentive mechanism for an agent to learn a better action. The change of the portfolio value when action a is taken at state s and arriving at new state s', i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio values at state s′ and s, respectively - State: The state space describes the observations that the agent receives from the environment. Just as a human trader needs to analyze various information before executing a trade, so our trading agent observes many different features to better learn in an interactive environment. - Environment: Dow 30 constituents The data of the stocks for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume. **Step 1.3: FinRL installation**: .. code-block:: :linenos: ## install finrl library !pip install git+https://github.com/AI4Finance-LLC/FinRL-Library.git Then we import the packages needed for this demonstration. **Step 1.4: Import packages**: .. code-block:: python :linenos: import pandas as pd import numpy as np import matplotlib import matplotlib.pyplot as plt # matplotlib.use('Agg') import datetime %matplotlib inline from finrl import config from finrl import config_tickers from finrl.meta.preprocessor.yahoodownloader import YahooDownloader from finrl.meta.preprocessor.preprocessors import FeatureEngineer, data_split from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv from finrl.agents.stablebaselines3.models import DRLAgent from finrl.plot import backtest_stats, backtest_plot, get_daily_return, get_baseline from pprint import pprint import sys sys.path.append("../FinRL-Library") import itertools Finally, create folders for storage. **Step 1.5: Create folders**: .. code-block:: python :linenos: import os if not os.path.exists("./" + config.DATA_SAVE_DIR): os.makedirs("./" + config.DATA_SAVE_DIR) if not os.path.exists("./" + config.TRAINED_MODEL_DIR): os.makedirs("./" + config.TRAINED_MODEL_DIR) if not os.path.exists("./" + config.TENSORBOARD_LOG_DIR): os.makedirs("./" + config.TENSORBOARD_LOG_DIR) if not os.path.exists("./" + config.RESULTS_DIR): os.makedirs("./" + config.RESULTS_DIR) Then all the preparation work are done. We can start now! Step 2: Download Data --------------------------------------- Before training our DRL agent, we need to get the historical data of DOW30 stocks first. Here we use the data from Yahoo! Finance. Yahoo! Finance is a website that provides stock data, financial news, financial reports, etc. All the data provided by Yahoo Finance is free. yfinance is an open-source library that provides APIs to download data from Yahoo! Finance. We will use this package to download data here. FinRL uses a YahooDownloader_ class to extract data. .. _YahooDownloader: https://github.com/AI4Finance-LLC/FinRL-Library/blob/master/finrl/marketdata/yahoodownloader.py .. code-block:: python class YahooDownloader: """ Provides methods for retrieving daily stock data from Yahoo Finance API Attributes ---------- start_date : str start date of the data (modified from config.py) end_date : str end date of the data (modified from config.py) ticker_list : list a list of stock tickers (modified from config.py) Methods ------- fetch_data() Fetches data from yahoo API """ Download and save the data in a pandas DataFrame: .. code-block:: python :linenos: # Download and save the data in a pandas DataFrame: df = YahooDownloader(start_date = '2009-01-01', end_date = '2020-09-30', ticker_list = config_tickers.DOW_30_TICKER).fetch_data() print(df.sort_values(['date','tic'],ignore_index=True).head(30)) .. image:: ../../image/multiple_3.png Step 3: Preprocess Data --------------------------------------- Data preprocessing is a crucial step for training a high quality machine learning model. We need to check for missing data and do feature engineering in order to convert the data into a model-ready state. **Step 3.1: Check missing data** .. code-block:: python :linenos: # check missing data dow_30.isnull().values.any() **Step 3.2: Add technical indicators** In practical trading, various information needs to be taken into account, for example the historical stock prices, current holding shares, technical indicators, etc. In this article, we demonstrate two trend-following technical indicators: MACD and RSI. .. code-block:: python :linenos: def add_technical_indicator(df): """ calcualte technical indicators use stockstats package to add technical inidactors :param data: (df) pandas dataframe :return: (df) pandas dataframe """ stock = Sdf.retype(df.copy()) stock['close'] = stock['adjcp'] unique_ticker = stock.tic.unique() macd = pd.DataFrame() rsi = pd.DataFrame() #temp = stock[stock.tic == unique_ticker[0]]['macd'] for i in range(len(unique_ticker)): ## macd temp_macd = stock[stock.tic == unique_ticker[i]]['macd'] temp_macd = pd.DataFrame(temp_macd) macd = macd.append(temp_macd, ignore_index=True) ## rsi temp_rsi = stock[stock.tic == unique_ticker[i]]['rsi_30'] temp_rsi = pd.DataFrame(temp_rsi) rsi = rsi.append(temp_rsi, ignore_index=True) df['macd'] = macd df['rsi'] = rsi return df **Step 3.3: Add turbulence index** Risk-aversion reflects whether an investor will choose to preserve the capital. It also influences one's trading strategy when facing different market volatility level. To control the risk in a worst-case scenario, such as financial crisis of 2007–2008, FinRL employs the financial turbulence index that measures extreme asset price fluctuation. .. code-block:: python :linenos: def add_turbulence(df): """ add turbulence index from a precalcualted dataframe :param data: (df) pandas dataframe :return: (df) pandas dataframe """ turbulence_index = calcualte_turbulence(df) df = df.merge(turbulence_index, on='datadate') df = df.sort_values(['datadate','tic']).reset_index(drop=True) return df def calcualte_turbulence(df): """calculate turbulence index based on dow 30""" # can add other market assets df_price_pivot=df.pivot(index='datadate', columns='tic', values='adjcp') unique_date = df.datadate.unique() # start after a year start = 252 turbulence_index = [0]*start #turbulence_index = [0] count=0 for i in range(start,len(unique_date)): current_price = df_price_pivot[df_price_pivot.index == unique_date[i]] hist_price = df_price_pivot[[n in unique_date[0:i] for n in df_price_pivot.index ]] cov_temp = hist_price.cov() current_temp=(current_price - np.mean(hist_price,axis=0)) temp = current_temp.values.dot(np.linalg.inv(cov_temp)).dot(current_temp.values.T) if temp>0: count+=1 if count>2: turbulence_temp = temp[0][0] else: #avoid large outlier because of the calculation just begins turbulence_temp=0 else: turbulence_temp=0 turbulence_index.append(turbulence_temp) turbulence_index = pd.DataFrame({'datadate':df_price_pivot.index, 'turbulence':turbulence_index}) return turbulence_index **Step 3.4 Feature Engineering** FinRL uses a FeatureEngineer_ class to preprocess data. .. _FeatureEngineer: https://github.com/AI4Finance-LLC/FinRL-Library/blob/master/finrl/preprocessing/preprocessors.py .. code-block: python class FeatureEngineer: """ Provides methods for preprocessing the stock price data Attributes ---------- df: DataFrame data downloaded from Yahoo API feature_number : int number of features we used use_technical_indicator : boolean we technical indicator or not use_turbulence : boolean use turbulence index or not Methods ------- preprocess_data() main method to do the feature engineering """ Perform Feature Engineering: .. code-block:: python :linenos: # Perform Feature Engineering: df = FeatureEngineer(df.copy(), use_technical_indicator=True, tech_indicator_list = config.INDICATORS, use_turbulence=True, user_defined_feature = False).preprocess_data() .. image:: ../../image/multiple_4.png Step 4: Design Environment --------------------------------------- Considering the stochastic and interactive nature of the automated stock trading tasks, a financial task is modeled as a Markov Decision Process (MDP) problem. The training process involves observing stock price change, taking an action and reward's calculation to have the agent adjusting its strategy accordingly. By interacting with the environment, the trading agent will derive a trading strategy with the maximized rewards as time proceeds. Our trading environments, based on OpenAI Gym framework, simulate live stock markets with real market data according to the principle of time-driven simulation. The action space describes the allowed actions that the agent interacts with the environment. Normally, action a includes three actions: {-1, 0, 1}, where -1, 0, 1 represent selling, holding, and buying one share. Also, an action can be carried upon multiple shares. We use an action space {-k,…,-1, 0, 1, …, k}, where k denotes the number of shares to buy and -k denotes the number of shares to sell. For example, "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or -10, respectively. The continuous action space needs to be normalized to [-1, 1], since the policy is defined on a Gaussian distribution, which needs to be normalized and symmetric. **Step 4.1: Environment for Training** .. code-block:: python :linenos: ## Environment for Training import numpy as np import pandas as pd from gym.utils import seeding import gym from gym import spaces import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt # shares normalization factor # 100 shares per trade HMAX_NORMALIZE = 100 # initial amount of money we have in our account INITIAL_ACCOUNT_BALANCE=1000000 # total number of stocks in our portfolio STOCK_DIM = 30 # transaction fee: 1/1000 reasonable percentage TRANSACTION_FEE_PERCENT = 0.001 REWARD_SCALING = 1e-4 class StockEnvTrain(gym.Env): """A stock trading environment for OpenAI gym""" metadata = {'render.modes': ['human']} def __init__(self, df,day = 0): #super(StockEnv, self).__init__() self.day = day self.df = df # action_space normalization and shape is STOCK_DIM self.action_space = spaces.Box(low = -1, high = 1,shape = (STOCK_DIM,)) # Shape = 181: [Current Balance]+[prices 1-30]+[owned shares 1-30] # +[macd 1-30]+ [rsi 1-30] + [cci 1-30] + [adx 1-30] self.observation_space = spaces.Box(low=0, high=np.inf, shape = (121,)) # load data from a pandas dataframe self.data = self.df.loc[self.day,:] self.terminal = False # initalize state self.state = [INITIAL_ACCOUNT_BALANCE] + \ self.data.adjcp.values.tolist() + \ [0]*STOCK_DIM + \ self.data.macd.values.tolist() + \ self.data.rsi.values.tolist() #self.data.cci.values.tolist() + \ #self.data.adx.values.tolist() # initialize reward self.reward = 0 self.cost = 0 # memorize all the total balance change self.asset_memory = [INITIAL_ACCOUNT_BALANCE] self.rewards_memory = [] self.trades = 0 self._seed() def _sell_stock(self, index, action): # perform sell action based on the sign of the action if self.state[index+STOCK_DIM+1] > 0: #update balance self.state[0] += \ self.state[index+1]*min(abs(action),self.state[index+STOCK_DIM+1]) * \ (1- TRANSACTION_FEE_PERCENT) self.state[index+STOCK_DIM+1] -= min(abs(action), self.state[index+STOCK_DIM+1]) self.cost +=self.state[index+1]*min(abs(action),self.state[index+STOCK_DIM+1]) * \ TRANSACTION_FEE_PERCENT self.trades+=1 else: pass def _buy_stock(self, index, action): # perform buy action based on the sign of the action available_amount = self.state[0] // self.state[index+1] # print('available_amount:{}'.format(available_amount)) #update balance self.state[0] -= self.state[index+1]*min(available_amount, action)* \ (1+ TRANSACTION_FEE_PERCENT) self.state[index+STOCK_DIM+1] += min(available_amount, action) self.cost+=self.state[index+1]*min(available_amount, action)* \ TRANSACTION_FEE_PERCENT self.trades+=1 def step(self, actions): # print(self.day) self.terminal = self.day >= len(self.df.index.unique())-1 # print(actions) if self.terminal: plt.plot(self.asset_memory,'r') plt.savefig('account_value_train.png') plt.close() end_total_asset = self.state[0]+ \ sum(np.array(self.state[1:(STOCK_DIM+1)])*np.array(self.state[(STOCK_DIM+1):(STOCK_DIM*2+1)])) print("previous_total_asset:{}".format(self.asset_memory[0])) print("end_total_asset:{}".format(end_total_asset)) df_total_value = pd.DataFrame(self.asset_memory) df_total_value.to_csv('account_value_train.csv') print("total_reward:{}".format(self.state[0]+sum(np.array(self.state[1:(STOCK_DIM+1)])*np.array(self.state[(STOCK_DIM+1):61]))- INITIAL_ACCOUNT_BALANCE )) print("total_cost: ", self.cost) print("total_trades: ", self.trades) df_total_value.columns = ['account_value'] df_total_value['daily_return']=df_total_value.pct_change(1) sharpe = (252**0.5)*df_total_value['daily_return'].mean()/ \ df_total_value['daily_return'].std() print("Sharpe: ",sharpe) print("=================================") df_rewards = pd.DataFrame(self.rewards_memory) df_rewards.to_csv('account_rewards_train.csv') return self.state, self.reward, self.terminal,{} else: actions = actions * HMAX_NORMALIZE begin_total_asset = self.state[0]+ \ sum(np.array(self.state[1:(STOCK_DIM+1)])*np.array(self.state[(STOCK_DIM+1):61])) #print("begin_total_asset:{}".format(begin_total_asset)) argsort_actions = np.argsort(actions) sell_index = argsort_actions[:np.where(actions < 0)[0].shape[0]] buy_index = argsort_actions[::-1][:np.where(actions > 0)[0].shape[0]] for index in sell_index: # print('take sell action'.format(actions[index])) self._sell_stock(index, actions[index]) for index in buy_index: # print('take buy action: {}'.format(actions[index])) self._buy_stock(index, actions[index]) self.day += 1 self.data = self.df.loc[self.day,:] #load next state # print("stock_shares:{}".format(self.state[29:])) self.state = [self.state[0]] + \ self.data.adjcp.values.tolist() + \ list(self.state[(STOCK_DIM+1):61]) + \ self.data.macd.values.tolist() + \ self.data.rsi.values.tolist() end_total_asset = self.state[0]+ \ sum(np.array(self.state[1:(STOCK_DIM+1)])*np.array(self.state[(STOCK_DIM+1):61])) #print("end_total_asset:{}".format(end_total_asset)) self.reward = end_total_asset - begin_total_asset self.rewards_memory.append(self.reward) self.reward = self.reward * REWARD_SCALING # print("step_reward:{}".format(self.reward)) self.asset_memory.append(end_total_asset) return self.state, self.reward, self.terminal, {} def reset(self): self.asset_memory = [INITIAL_ACCOUNT_BALANCE] self.day = 0 self.data = self.df.loc[self.day,:] self.cost = 0 self.trades = 0 self.terminal = False self.rewards_memory = [] #initiate state self.state = [INITIAL_ACCOUNT_BALANCE] + \ self.data.adjcp.values.tolist() + \ [0]*STOCK_DIM + \ self.data.macd.values.tolist() + \ self.data.rsi.values.tolist() return self.state def render(self, mode='human'): return self.state def _seed(self, seed=None): self.np_random, seed = seeding.np_random(seed) return [seed] **Step 4.2: Environment for Trading** .. code-block:: python :linenos: ## Environment for Trading import numpy as np import pandas as pd from gym.utils import seeding import gym from gym import spaces import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt # shares normalization factor # 100 shares per trade HMAX_NORMALIZE = 100 # initial amount of money we have in our account INITIAL_ACCOUNT_BALANCE=1000000 # total number of stocks in our portfolio STOCK_DIM = 30 # transaction fee: 1/1000 reasonable percentage TRANSACTION_FEE_PERCENT = 0.001 # turbulence index: 90-150 reasonable threshold #TURBULENCE_THRESHOLD = 140 REWARD_SCALING = 1e-4 class StockEnvTrade(gym.Env): """A stock trading environment for OpenAI gym""" metadata = {'render.modes': ['human']} def __init__(self, df,day = 0,turbulence_threshold=140): #super(StockEnv, self).__init__() #money = 10 , scope = 1 self.day = day self.df = df # action_space normalization and shape is STOCK_DIM self.action_space = spaces.Box(low = -1, high = 1,shape = (STOCK_DIM,)) # Shape = 181: [Current Balance]+[prices 1-30]+[owned shares 1-30] # +[macd 1-30]+ [rsi 1-30] + [cci 1-30] + [adx 1-30] self.observation_space = spaces.Box(low=0, high=np.inf, shape = (121,)) # load data from a pandas dataframe self.data = self.df.loc[self.day,:] self.terminal = False self.turbulence_threshold = turbulence_threshold # initalize state self.state = [INITIAL_ACCOUNT_BALANCE] + \ self.data.adjcp.values.tolist() + \ [0]*STOCK_DIM + \ self.data.macd.values.tolist() + \ self.data.rsi.values.tolist() # initialize reward self.reward = 0 self.turbulence = 0 self.cost = 0 self.trades = 0 # memorize all the total balance change self.asset_memory = [INITIAL_ACCOUNT_BALANCE] self.rewards_memory = [] self.actions_memory=[] self.date_memory=[] self._seed() def _sell_stock(self, index, action): # perform sell action based on the sign of the action if self.turbulence<self.turbulence_threshold: if self.state[index+STOCK_DIM+1] > 0: #update balance self.state[0] += \ self.state[index+1]*min(abs(action),self.state[index+STOCK_DIM+1]) * \ (1- TRANSACTION_FEE_PERCENT) self.state[index+STOCK_DIM+1] -= min(abs(action), self.state[index+STOCK_DIM+1]) self.cost +=self.state[index+1]*min(abs(action),self.state[index+STOCK_DIM+1]) * \ TRANSACTION_FEE_PERCENT self.trades+=1 else: pass else: # if turbulence goes over threshold, just clear out all positions if self.state[index+STOCK_DIM+1] > 0: #update balance self.state[0] += self.state[index+1]*self.state[index+STOCK_DIM+1]* \ (1- TRANSACTION_FEE_PERCENT) self.state[index+STOCK_DIM+1] =0 self.cost += self.state[index+1]*self.state[index+STOCK_DIM+1]* \ TRANSACTION_FEE_PERCENT self.trades+=1 else: pass def _buy_stock(self, index, action): # perform buy action based on the sign of the action if self.turbulence< self.turbulence_threshold: available_amount = self.state[0] // self.state[index+1] # print('available_amount:{}'.format(available_amount)) #update balance self.state[0] -= self.state[index+1]*min(available_amount, action)* \ (1+ TRANSACTION_FEE_PERCENT) self.state[index+STOCK_DIM+1] += min(available_amount, action) self.cost+=self.state[index+1]*min(available_amount, action)* \ TRANSACTION_FEE_PERCENT self.trades+=1 else: # if turbulence goes over threshold, just stop buying pass def step(self, actions): # print(self.day) self.terminal = self.day >= len(self.df.index.unique())-1 # print(actions) if self.terminal: plt.plot(self.asset_memory,'r') plt.savefig('account_value_trade.png') plt.close() df_date = pd.DataFrame(self.date_memory) df_date.columns = ['datadate'] df_date.to_csv('df_date.csv') df_actions = pd.DataFrame(self.actions_memory) df_actions.columns = self.data.tic.values df_actions.index = df_date.datadate df_actions.to_csv('df_actions.csv') df_total_value = pd.DataFrame(self.asset_memory) df_total_value.to_csv('account_value_trade.csv') end_total_asset = self.state[0]+ \ sum(np.array(self.state[1:(STOCK_DIM+1)])*np.array(self.state[(STOCK_DIM+1):(STOCK_DIM*2+1)])) print("previous_total_asset:{}".format(self.asset_memory[0])) print("end_total_asset:{}".format(end_total_asset)) print("total_reward:{}".format(self.state[0]+sum(np.array(self.state[1:(STOCK_DIM+1)])*np.array(self.state[(STOCK_DIM+1):61]))- self.asset_memory[0] )) print("total_cost: ", self.cost) print("total trades: ", self.trades) df_total_value.columns = ['account_value'] df_total_value['daily_return']=df_total_value.pct_change(1) sharpe = (252**0.5)*df_total_value['daily_return'].mean()/ \ df_total_value['daily_return'].std() print("Sharpe: ",sharpe) df_rewards = pd.DataFrame(self.rewards_memory) df_rewards.to_csv('account_rewards_trade.csv') # print('total asset: {}'.format(self.state[0]+ sum(np.array(self.state[1:29])*np.array(self.state[29:])))) #with open('obs.pkl', 'wb') as f: # pickle.dump(self.state, f) return self.state, self.reward, self.terminal,{} else: # print(np.array(self.state[1:29])) self.date_memory.append(self.data.datadate.unique()) #print(self.data) actions = actions * HMAX_NORMALIZE if self.turbulence>=self.turbulence_threshold: actions=np.array([-HMAX_NORMALIZE]*STOCK_DIM) self.actions_memory.append(actions) #actions = (actions.astype(int)) begin_total_asset = self.state[0]+ \ sum(np.array(self.state[1:(STOCK_DIM+1)])*np.array(self.state[(STOCK_DIM+1):(STOCK_DIM*2+1)])) #print("begin_total_asset:{}".format(begin_total_asset)) argsort_actions = np.argsort(actions) #print(argsort_actions) sell_index = argsort_actions[:np.where(actions < 0)[0].shape[0]] buy_index = argsort_actions[::-1][:np.where(actions > 0)[0].shape[0]] for index in sell_index: # print('take sell action'.format(actions[index])) self._sell_stock(index, actions[index]) for index in buy_index: # print('take buy action: {}'.format(actions[index])) self._buy_stock(index, actions[index]) self.day += 1 self.data = self.df.loc[self.day,:] self.turbulence = self.data['turbulence'].values[0] #print(self.turbulence) #load next state # print("stock_shares:{}".format(self.state[29:])) self.state = [self.state[0]] + \ self.data.adjcp.values.tolist() + \ list(self.state[(STOCK_DIM+1):(STOCK_DIM*2+1)]) + \ self.data.macd.values.tolist() + \ self.data.rsi.values.tolist() end_total_asset = self.state[0]+ \ sum(np.array(self.state[1:(STOCK_DIM+1)])*np.array(self.state[(STOCK_DIM+1):(STOCK_DIM*2+1)])) #print("end_total_asset:{}".format(end_total_asset)) self.reward = end_total_asset - begin_total_asset self.rewards_memory.append(self.reward) self.reward = self.reward * REWARD_SCALING self.asset_memory.append(end_total_asset) return self.state, self.reward, self.terminal, {} def reset(self): self.asset_memory = [INITIAL_ACCOUNT_BALANCE] self.day = 0 self.data = self.df.loc[self.day,:] self.turbulence = 0 self.cost = 0 self.trades = 0 self.terminal = False #self.iteration=self.iteration self.rewards_memory = [] self.actions_memory=[] self.date_memory=[] #initiate state self.state = [INITIAL_ACCOUNT_BALANCE] + \ self.data.adjcp.values.tolist() + \ [0]*STOCK_DIM + \ self.data.macd.values.tolist() + \ self.data.rsi.values.tolist() return self.state def render(self, mode='human',close=False): return self.state def _seed(self, seed=None): self.np_random, seed = seeding.np_random(seed) return [seed] Step 5: Implement DRL Algorithms ------------------------------------- The implementation of the DRL algorithms are based on OpenAI Baselines and Stable Baselines. Stable Baselines is a fork of OpenAI Baselines, with a major structural refactoring, and code cleanups. **Step 5.1: Training data split**: 2009-01-01 to 2018-12-31 .. code-block:: python :linenos: def data_split(df,start,end): """ split the dataset into training or testing using date :param data: (df) pandas dataframe, start, end :return: (df) pandas dataframe """ data = df[(df.datadate >= start) & (df.datadate < end)] data=data.sort_values(['datadate','tic'],ignore_index=True) data.index = data.datadate.factorize()[0] return data **Step 5.2: Model training**: DDPG .. code-block:: python :linenos: ## tensorboard --logdir ./multiple_stock_tensorboard/ # add noise to the action in DDPG helps in learning for better exploration n_actions = env_train.action_space.shape[-1] param_noise = None action_noise = OrnsteinUhlenbeckActionNoise(mean=np.zeros(n_actions), sigma=float(0.5) * np.ones(n_actions)) # model settings model_ddpg = DDPG('MlpPolicy', env_train, batch_size=64, buffer_size=100000, param_noise=param_noise, action_noise=action_noise, verbose=0, tensorboard_log="./multiple_stock_tensorboard/") ## 250k timesteps: took about 20 mins to finish model_ddpg.learn(total_timesteps=250000, tb_log_name="DDPG_run_1") **Step 5.3: Trading** Assume that we have $1,000,000 initial capital at 2019-01-01. We use the DDPG model to trade Dow jones 30 stocks. **Step 5.4: Set turbulence threshold** Set the turbulence threshold to be the 99% quantile of insample turbulence data, if current turbulence index is greater than the threshold, then we assume that the current market is volatile .. code-block:: python :linenos: insample_turbulence = dow_30[(dow_30.datadate<'2019-01-01') & (dow_30.datadate>='2009-01-01')] insample_turbulence = insample_turbulence.drop_duplicates(subset=['datadate']) **Step 5.5: Prepare test data and environment** .. code-block:: python :linenos: # test data test = data_split(dow_30, start='2019-01-01', end='2020-10-30') # testing env env_test = DummyVecEnv([lambda: StockEnvTrade(test, turbulence_threshold=insample_turbulence_threshold)]) obs_test = env_test.reset() **Step 5.6: Prediction** .. code-block:: python :linenos: def DRL_prediction(model, data, env, obs): print("==============Model Prediction===========") for i in range(len(data.index.unique())): action, _states = model.predict(obs) obs, rewards, dones, info = env.step(action) env.render() Step 6: Backtest Our Strategy --------------------------------- For simplicity purposes, in the article, we just calculate the Sharpe ratio and the annual return manually. .. code-block:: python :linenos: def backtest_strat(df): strategy_ret= df.copy() strategy_ret['Date'] = pd.to_datetime(strategy_ret['Date']) strategy_ret.set_index('Date', drop = False, inplace = True) strategy_ret.index = strategy_ret.index.tz_localize('UTC') del strategy_ret['Date'] ts = pd.Series(strategy_ret['daily_return'].values, index=strategy_ret.index) return ts **Step 6.1: Dow Jones Industrial Average** .. code-block:: python :linenos: def get_buy_and_hold_sharpe(test): test['daily_return']=test['adjcp'].pct_change(1) sharpe = (252**0.5)*test['daily_return'].mean()/ \ test['daily_return'].std() annual_return = ((test['daily_return'].mean()+1)**252-1)*100 print("annual return: ", annual_return) print("sharpe ratio: ", sharpe) #return sharpe **Step 6.2: Our DRL trading strategy** .. code-block:: python :linenos: def get_daily_return(df): df['daily_return']=df.account_value.pct_change(1) #df=df.dropna() sharpe = (252**0.5)*df['daily_return'].mean()/ \ df['daily_return'].std() annual_return = ((df['daily_return'].mean()+1)**252-1)*100 print("annual return: ", annual_return) print("sharpe ratio: ", sharpe) return df **Step 6.3: Plot the results using Quantopian pyfolio** Backtesting plays a key role in evaluating the performance of a trading strategy. Automated backtesting tool is preferred because it reduces the human error. We usually use the Quantopian pyfolio package to backtest our trading strategies. It is easy to use and consists of various individual plots that provide a comprehensive image of the performance of a trading strategy. .. code-block:: python :linenos: %matplotlib inline with pyfolio.plotting.plotting_context(font_scale=1.1): pyfolio.create_full_tear_sheet(returns = DRL_strat, benchmark_rets=dow_strat, set_context=False)