Book a Demo!
CoCalc Logo Icon
StoreFeaturesDocsShareSupportNewsAboutPoliciesSign UpSign In
AI4Finance-Foundation
GitHub Repository: AI4Finance-Foundation/FinRL
Path: blob/master/examples/Stock_NeurIPS2018_call_func_SB3.ipynb
726 views
Kernel: base

Open In Colab

Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading

  • Pytorch Version

Content

Task Discription

We train a DRL agent for stock trading. This task is modeled as a Markov Decision Process (MDP), and the objective function is maximizing (expected) cumulative return.

We specify the state-action-reward as follows:

  • State s: The state space represents an agent's perception of the market environment. Just like a human trader analyzing various information, here our agent passively observes many features and learns by interacting with the market environment (usually by replaying historical data).

  • Action a: The action space includes allowed actions that an agent can take at each state. For example, a ∈ {−1, 0, 1}, where −1, 0, 1 represent selling, holding, and buying. When an action operates multiple shares, a ∈{−k, ..., −1, 0, 1, ..., k}, e.g.. "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively

  • Reward function r(s, a, s′): Reward is an incentive for an agent to learn a better policy. For example, it can be the change of the portfolio value when taking a at state s and arriving at new state s', i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio values at state s′ and s, respectively

Market environment: 30 consituent stocks of Dow Jones Industrial Average (DJIA) index. Accessed at the starting date of the testing period.

The data for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.

Part 1. Install Python Packages

1.1. Install packages

## install required packages !pip install swig !pip install wrds !pip install pyportfolioopt ## install finrl library !pip install -q condacolab import condacolab condacolab.install() !apt-get update -y -qq && apt-get install -y -qq cmake libopenmpi-dev python3-dev zlib1g-dev libgl1-mesa-glx swig !pip install git+https://github.com/AI4Finance-Foundation/FinRL.git

1.2. Import Packages

from finrl import config from finrl import config_tickers from finrl.agents.stablebaselines3.models import DRLAgent from finrl.config import DATA_SAVE_DIR from finrl.config import INDICATORS from finrl.config import RESULTS_DIR from finrl.config import TENSORBOARD_LOG_DIR from finrl.config import TEST_END_DATE from finrl.config import TEST_START_DATE from finrl.config import TRAINED_MODEL_DIR from finrl.config_tickers import DOW_30_TICKER from finrl.main import check_and_make_directories from finrl.meta.data_processor import DataProcessor from finrl.meta.data_processors.func import calc_train_trade_data from finrl.meta.data_processors.func import calc_train_trade_starts_ends_if_rolling from finrl.meta.data_processors.func import date2str from finrl.meta.data_processors.func import str2date from finrl.meta.env_stock_trading.env_stocktrading import StockTradingEnv from finrl.meta.preprocessor.preprocessors import data_split from finrl.meta.preprocessor.preprocessors import FeatureEngineer from finrl.meta.preprocessor.yahoodownloader import YahooDownloader from finrl.plot import backtest_plot from finrl.plot import backtest_stats from finrl.plot import get_baseline from finrl.plot import get_daily_return from finrl.plot import plot_return from finrl.applications.stock_trading.stock_trading import stock_trading import sys sys.path.append("../FinRL") import itertools
/usr/local/lib/python3.9/site-packages/pyfolio/pos.py:26: UserWarning: Module "zipline.assets" not found; multipliers will not be applied to position notionals. warnings.warn(

2 Set parameters and run

train_start_date = "2009-01-01" train_end_date = "2022-09-01" trade_start_date = "2022-09-01" trade_end_date = "2023-11-01" if_store_actions = True if_store_result = True if_using_a2c = True if_using_ddpg = True if_using_ppo = True if_using_sac = True if_using_td3 = True stock_trading( train_start_date=train_start_date, train_end_date=train_end_date, trade_start_date=trade_start_date, trade_end_date=trade_end_date, if_store_actions=if_store_actions, if_store_result=if_store_result, if_using_a2c=if_using_a2c, if_using_ddpg=if_using_ddpg, if_using_ppo=if_using_ppo, if_using_sac=if_using_sac, if_using_td3=if_using_td3, )
流式输出内容被截断,只能显示最后 5000 行内容。 | std | 1.02 | | value_loss | 52.5 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 10 | | time_elapsed | 316 | | total_timesteps | 20480 | | train/ | | | approx_kl | 0.018726377 | | clip_fraction | 0.228 | | clip_range | 0.2 | | entropy_loss | -41.6 | | explained_variance | -0.00599 | | learning_rate | 0.00025 | | loss | 10 | | n_updates | 90 | | policy_gradient_loss | -0.0229 | | reward | 1.8604985 | | std | 1.02 | | value_loss | 34 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 11 | | time_elapsed | 350 | | total_timesteps | 22528 | | train/ | | | approx_kl | 0.017771121 | | clip_fraction | 0.201 | | clip_range | 0.2 | | entropy_loss | -41.7 | | explained_variance | -0.00452 | | learning_rate | 0.00025 | | loss | 102 | | n_updates | 100 | | policy_gradient_loss | -0.0176 | | reward | 2.4363315 | | std | 1.02 | | value_loss | 257 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 12 | | time_elapsed | 380 | | total_timesteps | 24576 | | train/ | | | approx_kl | 0.021592125 | | clip_fraction | 0.24 | | clip_range | 0.2 | | entropy_loss | -41.7 | | explained_variance | -0.00462 | | learning_rate | 0.00025 | | loss | 13.1 | | n_updates | 110 | | policy_gradient_loss | -0.0218 | | reward | -0.36686477 | | std | 1.02 | | value_loss | 27.8 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 13 | | time_elapsed | 417 | | total_timesteps | 26624 | | train/ | | | approx_kl | 0.016095877 | | clip_fraction | 0.171 | | clip_range | 0.2 | | entropy_loss | -41.8 | | explained_variance | 0.00607 | | learning_rate | 0.00025 | | loss | 63.8 | | n_updates | 120 | | policy_gradient_loss | -0.0175 | | reward | -6.2590113 | | std | 1.02 | | value_loss | 161 | ----------------------------------------- ---------------------------------------- | time/ | | | fps | 64 | | iterations | 14 | | time_elapsed | 447 | | total_timesteps | 28672 | | train/ | | | approx_kl | 0.02099569 | | clip_fraction | 0.204 | | clip_range | 0.2 | | entropy_loss | -41.9 | | explained_variance | 0.00587 | | learning_rate | 0.00025 | | loss | 18.1 | | n_updates | 130 | | policy_gradient_loss | -0.0176 | | reward | -1.5635415 | | std | 1.03 | | value_loss | 76.1 | ---------------------------------------- day: 3374, episode: 10 begin_total_asset: 1017321.61 end_total_asset: 4690150.25 total_reward: 3672828.63 total_cost: 440655.06 total_trades: 91574 Sharpe: 0.777 ================================= ------------------------------------------ | time/ | | | fps | 63 | | iterations | 15 | | time_elapsed | 480 | | total_timesteps | 30720 | | train/ | | | approx_kl | 0.01574407 | | clip_fraction | 0.252 | | clip_range | 0.2 | | entropy_loss | -41.9 | | explained_variance | 0.045 | | learning_rate | 0.00025 | | loss | 8.21 | | n_updates | 140 | | policy_gradient_loss | -0.0207 | | reward | -0.058135245 | | std | 1.03 | | value_loss | 20 | ------------------------------------------ ----------------------------------------- | time/ | | | fps | 64 | | iterations | 16 | | time_elapsed | 511 | | total_timesteps | 32768 | | train/ | | | approx_kl | 0.018864237 | | clip_fraction | 0.19 | | clip_range | 0.2 | | entropy_loss | -42 | | explained_variance | -0.0334 | | learning_rate | 0.00025 | | loss | 40.5 | | n_updates | 150 | | policy_gradient_loss | -0.0158 | | reward | 2.1892703 | | std | 1.03 | | value_loss | 80.4 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 17 | | time_elapsed | 542 | | total_timesteps | 34816 | | train/ | | | approx_kl | 0.025924759 | | clip_fraction | 0.183 | | clip_range | 0.2 | | entropy_loss | -42 | | explained_variance | -0.0494 | | learning_rate | 0.00025 | | loss | 8.64 | | n_updates | 160 | | policy_gradient_loss | -0.0154 | | reward | -1.6194284 | | std | 1.03 | | value_loss | 19.1 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 18 | | time_elapsed | 576 | | total_timesteps | 36864 | | train/ | | | approx_kl | 0.023486339 | | clip_fraction | 0.227 | | clip_range | 0.2 | | entropy_loss | -42 | | explained_variance | -0.00164 | | learning_rate | 0.00025 | | loss | 71 | | n_updates | 170 | | policy_gradient_loss | -0.0128 | | reward | -6.5787015 | | std | 1.03 | | value_loss | 175 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 19 | | time_elapsed | 609 | | total_timesteps | 38912 | | train/ | | | approx_kl | 0.047546946 | | clip_fraction | 0.278 | | clip_range | 0.2 | | entropy_loss | -42 | | explained_variance | 0.0083 | | learning_rate | 0.00025 | | loss | 22.2 | | n_updates | 180 | | policy_gradient_loss | -0.00743 | | reward | 3.6853487 | | std | 1.03 | | value_loss | 88.3 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 20 | | time_elapsed | 643 | | total_timesteps | 40960 | | train/ | | | approx_kl | 0.028585846 | | clip_fraction | 0.238 | | clip_range | 0.2 | | entropy_loss | -42.1 | | explained_variance | -0.018 | | learning_rate | 0.00025 | | loss | 12.6 | | n_updates | 190 | | policy_gradient_loss | -0.0166 | | reward | 2.84366 | | std | 1.03 | | value_loss | 35.7 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 21 | | time_elapsed | 672 | | total_timesteps | 43008 | | train/ | | | approx_kl | 0.021615773 | | clip_fraction | 0.283 | | clip_range | 0.2 | | entropy_loss | -42.1 | | explained_variance | 0.0164 | | learning_rate | 0.00025 | | loss | 39.1 | | n_updates | 200 | | policy_gradient_loss | -0.0119 | | reward | 7.260352 | | std | 1.04 | | value_loss | 85.5 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 22 | | time_elapsed | 703 | | total_timesteps | 45056 | | train/ | | | approx_kl | 0.023984132 | | clip_fraction | 0.174 | | clip_range | 0.2 | | entropy_loss | -42.2 | | explained_variance | -0.0214 | | learning_rate | 0.00025 | | loss | 10.8 | | n_updates | 210 | | policy_gradient_loss | -0.015 | | reward | 0.7453349 | | std | 1.04 | | value_loss | 27.4 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 23 | | time_elapsed | 736 | | total_timesteps | 47104 | | train/ | | | approx_kl | 0.026311198 | | clip_fraction | 0.239 | | clip_range | 0.2 | | entropy_loss | -42.2 | | explained_variance | 0.0117 | | learning_rate | 0.00025 | | loss | 53.5 | | n_updates | 220 | | policy_gradient_loss | -0.0147 | | reward | -3.601917 | | std | 1.04 | | value_loss | 109 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 24 | | time_elapsed | 765 | | total_timesteps | 49152 | | train/ | | | approx_kl | 0.021329464 | | clip_fraction | 0.228 | | clip_range | 0.2 | | entropy_loss | -42.2 | | explained_variance | 0.0287 | | learning_rate | 0.00025 | | loss | 35.5 | | n_updates | 230 | | policy_gradient_loss | -0.0174 | | reward | -1.4932549 | | std | 1.04 | | value_loss | 69.7 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 25 | | time_elapsed | 799 | | total_timesteps | 51200 | | train/ | | | approx_kl | 0.033834375 | | clip_fraction | 0.347 | | clip_range | 0.2 | | entropy_loss | -42.2 | | explained_variance | -0.0439 | | learning_rate | 0.00025 | | loss | 11.2 | | n_updates | 240 | | policy_gradient_loss | -0.0175 | | reward | -0.13293022 | | std | 1.04 | | value_loss | 31.1 | ----------------------------------------- {'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'} Using cpu device Logging to results/sac ----------------------------------- | time/ | | | episodes | 4 | | fps | 19 | | time_elapsed | 693 | | total_timesteps | 13500 | | train/ | | | actor_loss | 1.23e+03 | | critic_loss | 941 | | ent_coef | 0.175 | | ent_coef_loss | -80.9 | | learning_rate | 0.0001 | | n_updates | 13399 | | reward | -4.1185117 | ----------------------------------- ----------------------------------- | time/ | | | episodes | 8 | | fps | 19 | | time_elapsed | 1407 | | total_timesteps | 27000 | | train/ | | | actor_loss | 486 | | critic_loss | 378 | | ent_coef | 0.047 | | ent_coef_loss | -97.4 | | learning_rate | 0.0001 | | n_updates | 26899 | | reward | -6.0287046 | ----------------------------------- day: 3374, episode: 10 begin_total_asset: 1039580.61 end_total_asset: 4449383.64 total_reward: 3409803.03 total_cost: 3171.06 total_trades: 48397 Sharpe: 0.687 ================================= ----------------------------------- | time/ | | | episodes | 12 | | fps | 19 | | time_elapsed | 2123 | | total_timesteps | 40500 | | train/ | | | actor_loss | 201 | | critic_loss | 10.8 | | ent_coef | 0.0131 | | ent_coef_loss | -63.2 | | learning_rate | 0.0001 | | n_updates | 40399 | | reward | -5.6925883 | ----------------------------------- {'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001} Using cpu device Logging to results/td3 ----------------------------------- | time/ | | | episodes | 4 | | fps | 24 | | time_elapsed | 545 | | total_timesteps | 13500 | | train/ | | | actor_loss | 16.7 | | critic_loss | 341 | | learning_rate | 0.001 | | n_updates | 10125 | | reward | -5.7216434 | ----------------------------------- ----------------------------------- | time/ | | | episodes | 8 | | fps | 21 | | time_elapsed | 1228 | | total_timesteps | 27000 | | train/ | | | actor_loss | 18.7 | | critic_loss | 21.4 | | learning_rate | 0.001 | | n_updates | 23625 | | reward | -5.7216434 | ----------------------------------- day: 3374, episode: 10 begin_total_asset: 1043903.24 end_total_asset: 5291054.90 total_reward: 4247151.66 total_cost: 1042.86 total_trades: 64106 Sharpe: 0.723 ================================= ----------------------------------- | time/ | | | episodes | 12 | | fps | 21 | | time_elapsed | 1923 | | total_timesteps | 40500 | | train/ | | | actor_loss | 20.6 | | critic_loss | 15.9 | | learning_rate | 0.001 | | n_updates | 37125 | | reward | -5.7216434 | ----------------------------------- hit end! hit end! hit end! hit end! hit end! [*********************100%***********************] 1 of 1 completed Shape of DataFrame: (22, 8) i: 2 {'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0007} Using cpu device Logging to results/a2c --------------------------------------- | time/ | | | fps | 55 | | iterations | 100 | | time_elapsed | 9 | | total_timesteps | 500 | | train/ | | | entropy_loss | -41 | | explained_variance | 0.0552 | | learning_rate | 0.0007 | | n_updates | 99 | | policy_loss | -125 | | reward | -0.19224237 | | std | 0.997 | | value_loss | 10.9 | --------------------------------------- ------------------------------------ | time/ | | | fps | 65 | | iterations | 200 | | time_elapsed | 15 | | total_timesteps | 1000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 199 | | policy_loss | -65.7 | | reward | 2.47076 | | std | 0.998 | | value_loss | 3.14 | ------------------------------------ ------------------------------------- | time/ | | | fps | 61 | | iterations | 300 | | time_elapsed | 24 | | total_timesteps | 1500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 299 | | policy_loss | 221 | | reward | -0.668967 | | std | 0.999 | | value_loss | 38 | ------------------------------------- ------------------------------------ | time/ | | | fps | 61 | | iterations | 400 | | time_elapsed | 32 | | total_timesteps | 2000 | | train/ | | | entropy_loss | -41 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 399 | | policy_loss | 2.8 | | reward | 2.104001 | | std | 0.997 | | value_loss | 2.71 | ------------------------------------ ------------------------------------- | time/ | | | fps | 64 | | iterations | 500 | | time_elapsed | 39 | | total_timesteps | 2500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 499 | | policy_loss | 239 | | reward | 3.0126274 | | std | 0.999 | | value_loss | 39.6 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 600 | | time_elapsed | 48 | | total_timesteps | 3000 | | train/ | | | entropy_loss | -41.2 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 599 | | policy_loss | -524 | | reward | 3.9946847 | | std | 1 | | value_loss | 293 | ------------------------------------- ------------------------------------- | time/ | | | fps | 62 | | iterations | 700 | | time_elapsed | 56 | | total_timesteps | 3500 | | train/ | | | entropy_loss | -41.2 | | explained_variance | 0.108 | | learning_rate | 0.0007 | | n_updates | 699 | | policy_loss | -37.1 | | reward | 1.4987615 | | std | 1 | | value_loss | 1.8 | ------------------------------------- ------------------------------------ | time/ | | | fps | 63 | | iterations | 800 | | time_elapsed | 62 | | total_timesteps | 4000 | | train/ | | | entropy_loss | -41.2 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 799 | | policy_loss | -337 | | reward | 2.046587 | | std | 1 | | value_loss | 77.1 | ------------------------------------ ------------------------------------- | time/ | | | fps | 61 | | iterations | 900 | | time_elapsed | 72 | | total_timesteps | 4500 | | train/ | | | entropy_loss | -41.2 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 899 | | policy_loss | 26.2 | | reward | 1.0195923 | | std | 1 | | value_loss | 4.92 | ------------------------------------- -------------------------------------- | time/ | | | fps | 62 | | iterations | 1000 | | time_elapsed | 79 | | total_timesteps | 5000 | | train/ | | | entropy_loss | -41.2 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 999 | | policy_loss | 80.3 | | reward | -3.5495179 | | std | 1 | | value_loss | 9.56 | -------------------------------------- ------------------------------------- | time/ | | | fps | 63 | | iterations | 1100 | | time_elapsed | 86 | | total_timesteps | 5500 | | train/ | | | entropy_loss | -41.2 | | explained_variance | -2.38e-07 | | learning_rate | 0.0007 | | n_updates | 1099 | | policy_loss | -258 | | reward | 1.6695346 | | std | 1 | | value_loss | 44.9 | ------------------------------------- ------------------------------------ | time/ | | | fps | 62 | | iterations | 1200 | | time_elapsed | 96 | | total_timesteps | 6000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | -0.00397 | | learning_rate | 0.0007 | | n_updates | 1199 | | policy_loss | 185 | | reward | 2.245284 | | std | 1 | | value_loss | 26.3 | ------------------------------------ ------------------------------------- | time/ | | | fps | 60 | | iterations | 1300 | | time_elapsed | 106 | | total_timesteps | 6500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 1299 | | policy_loss | 150 | | reward | 1.491629 | | std | 0.998 | | value_loss | 21.5 | ------------------------------------- --------------------------------------- | time/ | | | fps | 59 | | iterations | 1400 | | time_elapsed | 117 | | total_timesteps | 7000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 1399 | | policy_loss | -107 | | reward | -0.81370664 | | std | 0.998 | | value_loss | 8.34 | --------------------------------------- ------------------------------------- | time/ | | | fps | 60 | | iterations | 1500 | | time_elapsed | 124 | | total_timesteps | 7500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | -0.0459 | | learning_rate | 0.0007 | | n_updates | 1499 | | policy_loss | -10 | | reward | 2.3688922 | | std | 0.997 | | value_loss | 0.922 | ------------------------------------- -------------------------------------- | time/ | | | fps | 60 | | iterations | 1600 | | time_elapsed | 131 | | total_timesteps | 8000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 1599 | | policy_loss | 128 | | reward | 0.56861943 | | std | 0.998 | | value_loss | 19.5 | -------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 1700 | | time_elapsed | 141 | | total_timesteps | 8500 | | train/ | | | entropy_loss | -41 | | explained_variance | 0.203 | | learning_rate | 0.0007 | | n_updates | 1699 | | policy_loss | 37 | | reward | 1.2727017 | | std | 0.996 | | value_loss | 3.2 | ------------------------------------- ------------------------------------- | time/ | | | fps | 60 | | iterations | 1800 | | time_elapsed | 148 | | total_timesteps | 9000 | | train/ | | | entropy_loss | -41 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 1799 | | policy_loss | 80.9 | | reward | 3.9352329 | | std | 0.996 | | value_loss | 9.56 | ------------------------------------- ------------------------------------ | time/ | | | fps | 60 | | iterations | 1900 | | time_elapsed | 156 | | total_timesteps | 9500 | | train/ | | | entropy_loss | -41 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 1899 | | policy_loss | 507 | | reward | 6.328624 | | std | 0.997 | | value_loss | 187 | ------------------------------------ -------------------------------------- | time/ | | | fps | 60 | | iterations | 2000 | | time_elapsed | 166 | | total_timesteps | 10000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 1999 | | policy_loss | -126 | | reward | -2.8187668 | | std | 1 | | value_loss | 8.37 | -------------------------------------- ------------------------------------- | time/ | | | fps | 60 | | iterations | 2100 | | time_elapsed | 172 | | total_timesteps | 10500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 2099 | | policy_loss | -58.5 | | reward | 0.2734109 | | std | 0.999 | | value_loss | 3 | ------------------------------------- -------------------------------------- | time/ | | | fps | 60 | | iterations | 2200 | | time_elapsed | 180 | | total_timesteps | 11000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 2199 | | policy_loss | 157 | | reward | 0.68144214 | | std | 0.997 | | value_loss | 19.9 | -------------------------------------- -------------------------------------- | time/ | | | fps | 60 | | iterations | 2300 | | time_elapsed | 190 | | total_timesteps | 11500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 1.19e-07 | | learning_rate | 0.0007 | | n_updates | 2299 | | policy_loss | -67.3 | | reward | -1.8721669 | | std | 0.999 | | value_loss | 2.49 | -------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 2400 | | time_elapsed | 196 | | total_timesteps | 12000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | -0.0105 | | learning_rate | 0.0007 | | n_updates | 2399 | | policy_loss | 144 | | reward | 0.47134838 | | std | 0.997 | | value_loss | 26.3 | -------------------------------------- -------------------------------------- | time/ | | | fps | 60 | | iterations | 2500 | | time_elapsed | 204 | | total_timesteps | 12500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 2499 | | policy_loss | 589 | | reward | -1.9081986 | | std | 0.997 | | value_loss | 221 | -------------------------------------- ------------------------------------- | time/ | | | fps | 60 | | iterations | 2600 | | time_elapsed | 213 | | total_timesteps | 13000 | | train/ | | | entropy_loss | -41 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 2599 | | policy_loss | 352 | | reward | 6.1386447 | | std | 0.996 | | value_loss | 148 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 2700 | | time_elapsed | 219 | | total_timesteps | 13500 | | train/ | | | entropy_loss | -41 | | explained_variance | -0.0134 | | learning_rate | 0.0007 | | n_updates | 2699 | | policy_loss | -131 | | reward | 0.6143146 | | std | 0.995 | | value_loss | 12.3 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 2800 | | time_elapsed | 229 | | total_timesteps | 14000 | | train/ | | | entropy_loss | -41 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 2799 | | policy_loss | 132 | | reward | 1.7656372 | | std | 0.995 | | value_loss | 13.2 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 2900 | | time_elapsed | 237 | | total_timesteps | 14500 | | train/ | | | entropy_loss | -41 | | explained_variance | -0.0245 | | learning_rate | 0.0007 | | n_updates | 2899 | | policy_loss | 17.1 | | reward | 0.08867768 | | std | 0.995 | | value_loss | 3.2 | -------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 3000 | | time_elapsed | 243 | | total_timesteps | 15000 | | train/ | | | entropy_loss | -41 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 2999 | | policy_loss | 48.4 | | reward | -3.6771903 | | std | 0.995 | | value_loss | 2.24 | -------------------------------------- ------------------------------------ | time/ | | | fps | 61 | | iterations | 3100 | | time_elapsed | 253 | | total_timesteps | 15500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 1.19e-07 | | learning_rate | 0.0007 | | n_updates | 3099 | | policy_loss | -0.565 | | reward | 0.679106 | | std | 0.998 | | value_loss | 1.98 | ------------------------------------ ---------------------------------------- | time/ | | | fps | 61 | | iterations | 3200 | | time_elapsed | 260 | | total_timesteps | 16000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3199 | | policy_loss | 26.6 | | reward | 0.0013427841 | | std | 0.998 | | value_loss | 11.3 | ---------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 3300 | | time_elapsed | 267 | | total_timesteps | 16500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3299 | | policy_loss | 54.6 | | reward | 1.6012005 | | std | 0.998 | | value_loss | 7.47 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 3400 | | time_elapsed | 277 | | total_timesteps | 17000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | -0.0366 | | learning_rate | 0.0007 | | n_updates | 3399 | | policy_loss | -33.7 | | reward | 0.8685799 | | std | 1 | | value_loss | 3.79 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 3500 | | time_elapsed | 284 | | total_timesteps | 17500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3499 | | policy_loss | 21 | | reward | 0.14613488 | | std | 1 | | value_loss | 1.32 | -------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 3600 | | time_elapsed | 291 | | total_timesteps | 18000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 3599 | | policy_loss | -173 | | reward | -1.2669375 | | std | 1 | | value_loss | 20.9 | -------------------------------------- ----------------------------------------- | time/ | | | fps | 61 | | iterations | 3700 | | time_elapsed | 301 | | total_timesteps | 18500 | | train/ | | | entropy_loss | -41.2 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3699 | | policy_loss | -53.5 | | reward | -0.0045535835 | | std | 1 | | value_loss | 8.58 | ----------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 3800 | | time_elapsed | 308 | | total_timesteps | 19000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3799 | | policy_loss | 98.2 | | reward | 0.6932831 | | std | 1 | | value_loss | 7.95 | ------------------------------------- ------------------------------------ | time/ | | | fps | 61 | | iterations | 3900 | | time_elapsed | 316 | | total_timesteps | 19500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3899 | | policy_loss | 628 | | reward | 8.699067 | | std | 1 | | value_loss | 335 | ------------------------------------ ------------------------------------- | time/ | | | fps | 61 | | iterations | 4000 | | time_elapsed | 325 | | total_timesteps | 20000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3999 | | policy_loss | 605 | | reward | 17.186195 | | std | 1 | | value_loss | 266 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 4100 | | time_elapsed | 331 | | total_timesteps | 20500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0.0757 | | learning_rate | 0.0007 | | n_updates | 4099 | | policy_loss | 187 | | reward | 1.1687305 | | std | 1 | | value_loss | 22.2 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 4200 | | time_elapsed | 341 | | total_timesteps | 21000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 4199 | | policy_loss | -7.42 | | reward | -0.5722446 | | std | 1 | | value_loss | 0.248 | -------------------------------------- --------------------------------------- | time/ | | | fps | 61 | | iterations | 4300 | | time_elapsed | 351 | | total_timesteps | 21500 | | train/ | | | entropy_loss | -41.2 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 4299 | | policy_loss | 44.7 | | reward | -0.39718863 | | std | 1 | | value_loss | 2.09 | --------------------------------------- --------------------------------------- | time/ | | | fps | 61 | | iterations | 4400 | | time_elapsed | 359 | | total_timesteps | 22000 | | train/ | | | entropy_loss | -41.2 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 4399 | | policy_loss | -29.9 | | reward | -0.07916131 | | std | 1 | | value_loss | 0.947 | --------------------------------------- ------------------------------------- | time/ | | | fps | 60 | | iterations | 4500 | | time_elapsed | 369 | | total_timesteps | 22500 | | train/ | | | entropy_loss | -41.2 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 4499 | | policy_loss | -28.5 | | reward | 1.8631558 | | std | 1 | | value_loss | 1.83 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 4600 | | time_elapsed | 375 | | total_timesteps | 23000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 4599 | | policy_loss | 58.4 | | reward | 1.4081724 | | std | 0.999 | | value_loss | 77 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 4700 | | time_elapsed | 383 | | total_timesteps | 23500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 4699 | | policy_loss | 79.9 | | reward | -2.0728068 | | std | 0.998 | | value_loss | 4.66 | -------------------------------------- --------------------------------------- | time/ | | | fps | 61 | | iterations | 4800 | | time_elapsed | 392 | | total_timesteps | 24000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0.0901 | | learning_rate | 0.0007 | | n_updates | 4799 | | policy_loss | -15.9 | | reward | -0.09403604 | | std | 0.998 | | value_loss | 0.165 | --------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 4900 | | time_elapsed | 399 | | total_timesteps | 24500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 4899 | | policy_loss | -110 | | reward | 2.0542228 | | std | 1 | | value_loss | 10.8 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 5000 | | time_elapsed | 407 | | total_timesteps | 25000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 4999 | | policy_loss | -186 | | reward | 2.1355224 | | std | 0.999 | | value_loss | 27.6 | ------------------------------------- --------------------------------------- | time/ | | | fps | 61 | | iterations | 5100 | | time_elapsed | 416 | | total_timesteps | 25500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 5099 | | policy_loss | -13.1 | | reward | -0.08651471 | | std | 0.999 | | value_loss | 1.59 | --------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 5200 | | time_elapsed | 423 | | total_timesteps | 26000 | | train/ | | | entropy_loss | -41 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 5199 | | policy_loss | 99 | | reward | 2.3819537 | | std | 0.997 | | value_loss | 29.5 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 5300 | | time_elapsed | 432 | | total_timesteps | 26500 | | train/ | | | entropy_loss | -41 | | explained_variance | 1.19e-07 | | learning_rate | 0.0007 | | n_updates | 5299 | | policy_loss | 372 | | reward | -22.664398 | | std | 0.997 | | value_loss | 196 | -------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 5400 | | time_elapsed | 440 | | total_timesteps | 27000 | | train/ | | | entropy_loss | -41 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 5399 | | policy_loss | 153 | | reward | 1.1176498 | | std | 0.995 | | value_loss | 15.5 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 5500 | | time_elapsed | 446 | | total_timesteps | 27500 | | train/ | | | entropy_loss | -41 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 5499 | | policy_loss | -92.6 | | reward | -5.1304746 | | std | 0.996 | | value_loss | 14.5 | -------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 5600 | | time_elapsed | 456 | | total_timesteps | 28000 | | train/ | | | entropy_loss | -41 | | explained_variance | 0.0186 | | learning_rate | 0.0007 | | n_updates | 5599 | | policy_loss | 62.9 | | reward | 1.1683302 | | std | 0.996 | | value_loss | 9.46 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 5700 | | time_elapsed | 464 | | total_timesteps | 28500 | | train/ | | | entropy_loss | -41 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 5699 | | policy_loss | 34.4 | | reward | -3.4618378 | | std | 0.995 | | value_loss | 12 | -------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 5800 | | time_elapsed | 471 | | total_timesteps | 29000 | | train/ | | | entropy_loss | -41 | | explained_variance | -0.247 | | learning_rate | 0.0007 | | n_updates | 5799 | | policy_loss | -171 | | reward | 5.8363895 | | std | 0.996 | | value_loss | 20.7 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 5900 | | time_elapsed | 481 | | total_timesteps | 29500 | | train/ | | | entropy_loss | -41 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 5899 | | policy_loss | 250 | | reward | -4.7651134 | | std | 0.996 | | value_loss | 83.8 | -------------------------------------- ------------------------------------ | time/ | | | fps | 61 | | iterations | 6000 | | time_elapsed | 487 | | total_timesteps | 30000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 5999 | | policy_loss | -211 | | reward | 3.167639 | | std | 0.999 | | value_loss | 95.9 | ------------------------------------ day: 3352, episode: 10 begin_total_asset: 1005653.74 end_total_asset: 8785262.38 total_reward: 7779608.65 total_cost: 31168.83 total_trades: 45054 Sharpe: 0.834 ================================= -------------------------------------- | time/ | | | fps | 61 | | iterations | 6100 | | time_elapsed | 495 | | total_timesteps | 30500 | | train/ | | | entropy_loss | -41.1 | | explained_variance | -0.113 | | learning_rate | 0.0007 | | n_updates | 6099 | | policy_loss | 21.6 | | reward | 0.12259659 | | std | 1 | | value_loss | 0.918 | -------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 6200 | | time_elapsed | 505 | | total_timesteps | 31000 | | train/ | | | entropy_loss | -41.1 | | explained_variance | -0.00301 | | learning_rate | 0.0007 | | n_updates | 6199 | | policy_loss | -126 | | reward | 1.5182142 | | std | 1 | | value_loss | 16.8 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 6300 | | time_elapsed | 511 | | total_timesteps | 31500 | | train/ | | | entropy_loss | -41.2 | | explained_variance | -1.74 | | learning_rate | 0.0007 | | n_updates | 6299 | | policy_loss | -25.1 | | reward | 0.5284405 | | std | 1 | | value_loss | 1.56 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 6400 | | time_elapsed | 519 | | total_timesteps | 32000 | | train/ | | | entropy_loss | -41.3 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 6399 | | policy_loss | 123 | | reward | -3.4704874 | | std | 1.01 | | value_loss | 14.2 | -------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 6500 | | time_elapsed | 528 | | total_timesteps | 32500 | | train/ | | | entropy_loss | -41.4 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 6499 | | policy_loss | -102 | | reward | 3.8022645 | | std | 1.01 | | value_loss | 11 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 6600 | | time_elapsed | 535 | | total_timesteps | 33000 | | train/ | | | entropy_loss | -41.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 6599 | | policy_loss | 372 | | reward | 24.101572 | | std | 1.01 | | value_loss | 136 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 6700 | | time_elapsed | 543 | | total_timesteps | 33500 | | train/ | | | entropy_loss | -41.4 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 6699 | | policy_loss | -923 | | reward | -11.455252 | | std | 1.01 | | value_loss | 474 | -------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 6800 | | time_elapsed | 552 | | total_timesteps | 34000 | | train/ | | | entropy_loss | -41.4 | | explained_variance | -0.553 | | learning_rate | 0.0007 | | n_updates | 6799 | | policy_loss | -111 | | reward | 0.10300914 | | std | 1.01 | | value_loss | 9.3 | -------------------------------------- --------------------------------------- | time/ | | | fps | 61 | | iterations | 6900 | | time_elapsed | 558 | | total_timesteps | 34500 | | train/ | | | entropy_loss | -41.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 6899 | | policy_loss | 133 | | reward | -0.71322364 | | std | 1.01 | | value_loss | 15.3 | --------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 7000 | | time_elapsed | 567 | | total_timesteps | 35000 | | train/ | | | entropy_loss | -41.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 6999 | | policy_loss | -84.3 | | reward | 2.3256698 | | std | 1.01 | | value_loss | 4.58 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 7100 | | time_elapsed | 575 | | total_timesteps | 35500 | | train/ | | | entropy_loss | -41.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 7099 | | policy_loss | -2.44 | | reward | 0.9263134 | | std | 1.01 | | value_loss | 1.21 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 7200 | | time_elapsed | 586 | | total_timesteps | 36000 | | train/ | | | entropy_loss | -41.5 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 7199 | | policy_loss | -228 | | reward | -1.9283981 | | std | 1.01 | | value_loss | 42 | -------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 7300 | | time_elapsed | 596 | | total_timesteps | 36500 | | train/ | | | entropy_loss | -41.5 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 7299 | | policy_loss | 81 | | reward | -6.168546 | | std | 1.01 | | value_loss | 8.16 | ------------------------------------- --------------------------------------- | time/ | | | fps | 61 | | iterations | 7400 | | time_elapsed | 602 | | total_timesteps | 37000 | | train/ | | | entropy_loss | -41.5 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 7399 | | policy_loss | -136 | | reward | -0.66517484 | | std | 1.02 | | value_loss | 12.4 | --------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 7500 | | time_elapsed | 610 | | total_timesteps | 37500 | | train/ | | | entropy_loss | -41.6 | | explained_variance | 1.44e-05 | | learning_rate | 0.0007 | | n_updates | 7499 | | policy_loss | -368 | | reward | 0.28679553 | | std | 1.02 | | value_loss | 81.7 | -------------------------------------- --------------------------------------- | time/ | | | fps | 61 | | iterations | 7600 | | time_elapsed | 620 | | total_timesteps | 38000 | | train/ | | | entropy_loss | -41.6 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 7599 | | policy_loss | -80.4 | | reward | -0.02342434 | | std | 1.02 | | value_loss | 4.71 | --------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 7700 | | time_elapsed | 626 | | total_timesteps | 38500 | | train/ | | | entropy_loss | -41.6 | | explained_variance | -3.11 | | learning_rate | 0.0007 | | n_updates | 7699 | | policy_loss | -91.7 | | reward | 2.6142132 | | std | 1.02 | | value_loss | 6.61 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 7800 | | time_elapsed | 635 | | total_timesteps | 39000 | | train/ | | | entropy_loss | -41.6 | | explained_variance | 0.125 | | learning_rate | 0.0007 | | n_updates | 7799 | | policy_loss | 4.76 | | reward | -1.2840562 | | std | 1.02 | | value_loss | 0.762 | -------------------------------------- --------------------------------------- | time/ | | | fps | 61 | | iterations | 7900 | | time_elapsed | 644 | | total_timesteps | 39500 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0.0476 | | learning_rate | 0.0007 | | n_updates | 7899 | | policy_loss | 273 | | reward | -0.55217224 | | std | 1.02 | | value_loss | 46.2 | --------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 8000 | | time_elapsed | 650 | | total_timesteps | 40000 | | train/ | | | entropy_loss | -41.6 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 7999 | | policy_loss | -769 | | reward | 1.6622137 | | std | 1.02 | | value_loss | 367 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 8100 | | time_elapsed | 660 | | total_timesteps | 40500 | | train/ | | | entropy_loss | -41.6 | | explained_variance | -0.131 | | learning_rate | 0.0007 | | n_updates | 8099 | | policy_loss | 38.2 | | reward | 0.38162667 | | std | 1.02 | | value_loss | 1.09 | -------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 8200 | | time_elapsed | 668 | | total_timesteps | 41000 | | train/ | | | entropy_loss | -41.6 | | explained_variance | -0.306 | | learning_rate | 0.0007 | | n_updates | 8199 | | policy_loss | 40.8 | | reward | 0.8386523 | | std | 1.02 | | value_loss | 3.26 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 8300 | | time_elapsed | 674 | | total_timesteps | 41500 | | train/ | | | entropy_loss | -41.5 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 8299 | | policy_loss | 41.7 | | reward | 1.5822707 | | std | 1.02 | | value_loss | 4.75 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 8400 | | time_elapsed | 684 | | total_timesteps | 42000 | | train/ | | | entropy_loss | -41.6 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 8399 | | policy_loss | 133 | | reward | 0.1792632 | | std | 1.02 | | value_loss | 15.6 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 8500 | | time_elapsed | 691 | | total_timesteps | 42500 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 8499 | | policy_loss | 99.2 | | reward | 1.6896911 | | std | 1.02 | | value_loss | 33.2 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 8600 | | time_elapsed | 699 | | total_timesteps | 43000 | | train/ | | | entropy_loss | -41.7 | | explained_variance | -0.0163 | | learning_rate | 0.0007 | | n_updates | 8599 | | policy_loss | -836 | | reward | 30.580954 | | std | 1.02 | | value_loss | 436 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 8700 | | time_elapsed | 709 | | total_timesteps | 43500 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0.0669 | | learning_rate | 0.0007 | | n_updates | 8699 | | policy_loss | -430 | | reward | -9.169519 | | std | 1.02 | | value_loss | 186 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 8800 | | time_elapsed | 715 | | total_timesteps | 44000 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0.178 | | learning_rate | 0.0007 | | n_updates | 8799 | | policy_loss | -2.39 | | reward | -0.505542 | | std | 1.02 | | value_loss | 0.0762 | ------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 8900 | | time_elapsed | 723 | | total_timesteps | 44500 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0.0519 | | learning_rate | 0.0007 | | n_updates | 8899 | | policy_loss | -29 | | reward | 1.4009765 | | std | 1.02 | | value_loss | 0.617 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 9000 | | time_elapsed | 733 | | total_timesteps | 45000 | | train/ | | | entropy_loss | -41.8 | | explained_variance | 0.0464 | | learning_rate | 0.0007 | | n_updates | 8999 | | policy_loss | -142 | | reward | -1.6482956 | | std | 1.02 | | value_loss | 12.1 | -------------------------------------- ---------------------------------------- | time/ | | | fps | 61 | | iterations | 9100 | | time_elapsed | 739 | | total_timesteps | 45500 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0.128 | | learning_rate | 0.0007 | | n_updates | 9099 | | policy_loss | -18.8 | | reward | -0.022230674 | | std | 1.02 | | value_loss | 1.08 | ---------------------------------------- --------------------------------------- | time/ | | | fps | 61 | | iterations | 9200 | | time_elapsed | 748 | | total_timesteps | 46000 | | train/ | | | entropy_loss | -41.8 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 9199 | | policy_loss | 35.8 | | reward | -0.16132466 | | std | 1.02 | | value_loss | 6.5 | --------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 9300 | | time_elapsed | 757 | | total_timesteps | 46500 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0.00416 | | learning_rate | 0.0007 | | n_updates | 9299 | | policy_loss | -192 | | reward | -3.3674068 | | std | 1.02 | | value_loss | 87.7 | -------------------------------------- ------------------------------------ | time/ | | | fps | 61 | | iterations | 9400 | | time_elapsed | 763 | | total_timesteps | 47000 | | train/ | | | entropy_loss | -41.8 | | explained_variance | -0.0393 | | learning_rate | 0.0007 | | n_updates | 9399 | | policy_loss | -37.6 | | reward | 1.150722 | | std | 1.02 | | value_loss | 3.37 | ------------------------------------ -------------------------------------- | time/ | | | fps | 61 | | iterations | 9500 | | time_elapsed | 772 | | total_timesteps | 47500 | | train/ | | | entropy_loss | -41.8 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 9499 | | policy_loss | -25.2 | | reward | 0.41208658 | | std | 1.02 | | value_loss | 0.698 | -------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 9600 | | time_elapsed | 780 | | total_timesteps | 48000 | | train/ | | | entropy_loss | -41.9 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 9599 | | policy_loss | 9.9 | | reward | 0.5765088 | | std | 1.03 | | value_loss | 2.06 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 9700 | | time_elapsed | 787 | | total_timesteps | 48500 | | train/ | | | entropy_loss | -41.9 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 9699 | | policy_loss | 315 | | reward | -0.2841707 | | std | 1.03 | | value_loss | 49.4 | -------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 9800 | | time_elapsed | 797 | | total_timesteps | 49000 | | train/ | | | entropy_loss | -41.9 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 9799 | | policy_loss | 125 | | reward | 0.6355639 | | std | 1.03 | | value_loss | 9.9 | ------------------------------------- -------------------------------------- | time/ | | | fps | 61 | | iterations | 9900 | | time_elapsed | 804 | | total_timesteps | 49500 | | train/ | | | entropy_loss | -41.9 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 9899 | | policy_loss | 155 | | reward | -4.6037025 | | std | 1.03 | | value_loss | 16.2 | -------------------------------------- ------------------------------------- | time/ | | | fps | 61 | | iterations | 10000 | | time_elapsed | 811 | | total_timesteps | 50000 | | train/ | | | entropy_loss | -41.9 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 9999 | | policy_loss | 104 | | reward | -3.306132 | | std | 1.03 | | value_loss | 9.5 | ------------------------------------- {'batch_size': 128, 'buffer_size': 50000, 'learning_rate': 0.001} Using cpu device Logging to results/ddpg ----------------------------------- | time/ | | | episodes | 4 | | fps | 24 | | time_elapsed | 547 | | total_timesteps | 13412 | | train/ | | | actor_loss | 12.5 | | critic_loss | 295 | | learning_rate | 0.001 | | n_updates | 10059 | | reward | -6.3763723 | ----------------------------------- ----------------------------------- | time/ | | | episodes | 8 | | fps | 21 | | time_elapsed | 1237 | | total_timesteps | 26824 | | train/ | | | actor_loss | -5.51 | | critic_loss | 14.7 | | learning_rate | 0.001 | | n_updates | 23471 | | reward | -6.3763723 | ----------------------------------- day: 3352, episode: 10 begin_total_asset: 1011382.29 end_total_asset: 6058882.63 total_reward: 5047500.34 total_cost: 1010.37 total_trades: 46928 Sharpe: 0.807 ================================= ----------------------------------- | time/ | | | episodes | 12 | | fps | 20 | | time_elapsed | 1942 | | total_timesteps | 40236 | | train/ | | | actor_loss | -8.67 | | critic_loss | 7.57 | | learning_rate | 0.001 | | n_updates | 36883 | | reward | -6.3763723 | ----------------------------------- {'n_steps': 2048, 'ent_coef': 0.01, 'learning_rate': 0.00025, 'batch_size': 128} Using cpu device Logging to results/ppo ------------------------------------ | time/ | | | fps | 60 | | iterations | 1 | | time_elapsed | 33 | | total_timesteps | 2048 | | train/ | | | reward | -0.20400214 | ------------------------------------ ----------------------------------------- | time/ | | | fps | 61 | | iterations | 2 | | time_elapsed | 66 | | total_timesteps | 4096 | | train/ | | | approx_kl | 0.019446293 | | clip_fraction | 0.218 | | clip_range | 0.2 | | entropy_loss | -41.2 | | explained_variance | -0.0153 | | learning_rate | 0.00025 | | loss | 8.32 | | n_updates | 10 | | policy_gradient_loss | -0.0239 | | reward | 0.964798 | | std | 1 | | value_loss | 12.1 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 60 | | iterations | 3 | | time_elapsed | 100 | | total_timesteps | 6144 | | train/ | | | approx_kl | 0.017402954 | | clip_fraction | 0.182 | | clip_range | 0.2 | | entropy_loss | -41.2 | | explained_variance | 0.000782 | | learning_rate | 0.00025 | | loss | 28.3 | | n_updates | 20 | | policy_gradient_loss | -0.0164 | | reward | 7.5384216 | | std | 1 | | value_loss | 51.3 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 4 | | time_elapsed | 131 | | total_timesteps | 8192 | | train/ | | | approx_kl | 0.015737543 | | clip_fraction | 0.162 | | clip_range | 0.2 | | entropy_loss | -41.3 | | explained_variance | -0.0037 | | learning_rate | 0.00025 | | loss | 22.7 | | n_updates | 30 | | policy_gradient_loss | -0.0225 | | reward | 2.27421 | | std | 1.01 | | value_loss | 38.2 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 5 | | time_elapsed | 165 | | total_timesteps | 10240 | | train/ | | | approx_kl | 0.020310912 | | clip_fraction | 0.184 | | clip_range | 0.2 | | entropy_loss | -41.3 | | explained_variance | -0.00721 | | learning_rate | 0.00025 | | loss | 12 | | n_updates | 40 | | policy_gradient_loss | -0.0202 | | reward | 0.7753585 | | std | 1.01 | | value_loss | 21.3 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 6 | | time_elapsed | 196 | | total_timesteps | 12288 | | train/ | | | approx_kl | 0.014960194 | | clip_fraction | 0.143 | | clip_range | 0.2 | | entropy_loss | -41.4 | | explained_variance | -0.0299 | | learning_rate | 0.00025 | | loss | 30.5 | | n_updates | 50 | | policy_gradient_loss | -0.0179 | | reward | 2.62347 | | std | 1.01 | | value_loss | 46.2 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 7 | | time_elapsed | 228 | | total_timesteps | 14336 | | train/ | | | approx_kl | 0.023127541 | | clip_fraction | 0.193 | | clip_range | 0.2 | | entropy_loss | -41.4 | | explained_variance | -0.023 | | learning_rate | 0.00025 | | loss | 6.36 | | n_updates | 60 | | policy_gradient_loss | -0.0221 | | reward | 1.0379714 | | std | 1.01 | | value_loss | 14.1 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 8 | | time_elapsed | 260 | | total_timesteps | 16384 | | train/ | | | approx_kl | 0.018745095 | | clip_fraction | 0.201 | | clip_range | 0.2 | | entropy_loss | -41.5 | | explained_variance | 0.00254 | | learning_rate | 0.00025 | | loss | 18.3 | | n_updates | 70 | | policy_gradient_loss | -0.019 | | reward | -0.21705139 | | std | 1.01 | | value_loss | 59.5 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 9 | | time_elapsed | 294 | | total_timesteps | 18432 | | train/ | | | approx_kl | 0.018167643 | | clip_fraction | 0.154 | | clip_range | 0.2 | | entropy_loss | -41.5 | | explained_variance | -0.000664 | | learning_rate | 0.00025 | | loss | 21.5 | | n_updates | 80 | | policy_gradient_loss | -0.0144 | | reward | -0.31962025 | | std | 1.01 | | value_loss | 36.6 | ----------------------------------------- ---------------------------------------- | time/ | | | fps | 62 | | iterations | 10 | | time_elapsed | 328 | | total_timesteps | 20480 | | train/ | | | approx_kl | 0.02108417 | | clip_fraction | 0.244 | | clip_range | 0.2 | | entropy_loss | -41.6 | | explained_variance | 0.0203 | | learning_rate | 0.00025 | | loss | 7.36 | | n_updates | 90 | | policy_gradient_loss | -0.0191 | | reward | 0.07936729 | | std | 1.02 | | value_loss | 23.3 | ---------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 11 | | time_elapsed | 357 | | total_timesteps | 22528 | | train/ | | | approx_kl | 0.014700897 | | clip_fraction | 0.166 | | clip_range | 0.2 | | entropy_loss | -41.6 | | explained_variance | 0.00383 | | learning_rate | 0.00025 | | loss | 29.1 | | n_updates | 100 | | policy_gradient_loss | -0.0156 | | reward | 1.4870173 | | std | 1.02 | | value_loss | 93.3 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 12 | | time_elapsed | 391 | | total_timesteps | 24576 | | train/ | | | approx_kl | 0.017688308 | | clip_fraction | 0.194 | | clip_range | 0.2 | | entropy_loss | -41.7 | | explained_variance | -0.0104 | | learning_rate | 0.00025 | | loss | 6.58 | | n_updates | 110 | | policy_gradient_loss | -0.0161 | | reward | -0.7623598 | | std | 1.02 | | value_loss | 17.5 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 13 | | time_elapsed | 422 | | total_timesteps | 26624 | | train/ | | | approx_kl | 0.023069832 | | clip_fraction | 0.24 | | clip_range | 0.2 | | entropy_loss | -41.7 | | explained_variance | 0.0101 | | learning_rate | 0.00025 | | loss | 38.8 | | n_updates | 120 | | policy_gradient_loss | -0.0147 | | reward | 3.4454083 | | std | 1.02 | | value_loss | 64.9 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 14 | | time_elapsed | 454 | | total_timesteps | 28672 | | train/ | | | approx_kl | 0.017561657 | | clip_fraction | 0.204 | | clip_range | 0.2 | | entropy_loss | -41.7 | | explained_variance | -0.0172 | | learning_rate | 0.00025 | | loss | 21.4 | | n_updates | 130 | | policy_gradient_loss | -0.02 | | reward | 0.9586051 | | std | 1.02 | | value_loss | 52.4 | ----------------------------------------- day: 3352, episode: 10 begin_total_asset: 988584.72 end_total_asset: 3416710.65 total_reward: 2428125.94 total_cost: 420148.99 total_trades: 89136 Sharpe: 0.598 ================================= ---------------------------------------- | time/ | | | fps | 63 | | iterations | 15 | | time_elapsed | 487 | | total_timesteps | 30720 | | train/ | | | approx_kl | 0.02006042 | | clip_fraction | 0.219 | | clip_range | 0.2 | | entropy_loss | -41.7 | | explained_variance | -0.0279 | | learning_rate | 0.00025 | | loss | 13.3 | | n_updates | 140 | | policy_gradient_loss | -0.0185 | | reward | -0.3580386 | | std | 1.02 | | value_loss | 23.2 | ---------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 16 | | time_elapsed | 523 | | total_timesteps | 32768 | | train/ | | | approx_kl | 0.025233287 | | clip_fraction | 0.243 | | clip_range | 0.2 | | entropy_loss | -41.8 | | explained_variance | -0.00552 | | learning_rate | 0.00025 | | loss | 22.4 | | n_updates | 150 | | policy_gradient_loss | -0.0176 | | reward | -0.5090524 | | std | 1.02 | | value_loss | 69.5 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 17 | | time_elapsed | 556 | | total_timesteps | 34816 | | train/ | | | approx_kl | 0.022021335 | | clip_fraction | 0.216 | | clip_range | 0.2 | | entropy_loss | -41.8 | | explained_variance | 0.0188 | | learning_rate | 0.00025 | | loss | 8.75 | | n_updates | 160 | | policy_gradient_loss | -0.0188 | | reward | 1.8985721 | | std | 1.03 | | value_loss | 23.2 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 18 | | time_elapsed | 586 | | total_timesteps | 36864 | | train/ | | | approx_kl | 0.019396901 | | clip_fraction | 0.229 | | clip_range | 0.2 | | entropy_loss | -41.9 | | explained_variance | 0.00194 | | learning_rate | 0.00025 | | loss | 14.6 | | n_updates | 170 | | policy_gradient_loss | -0.0195 | | reward | -0.31956208 | | std | 1.03 | | value_loss | 39.6 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 19 | | time_elapsed | 622 | | total_timesteps | 38912 | | train/ | | | approx_kl | 0.020318478 | | clip_fraction | 0.225 | | clip_range | 0.2 | | entropy_loss | -41.9 | | explained_variance | 0.0132 | | learning_rate | 0.00025 | | loss | 22.3 | | n_updates | 180 | | policy_gradient_loss | -0.0128 | | reward | 0.33881456 | | std | 1.03 | | value_loss | 55.7 | ----------------------------------------- ---------------------------------------- | time/ | | | fps | 62 | | iterations | 20 | | time_elapsed | 652 | | total_timesteps | 40960 | | train/ | | | approx_kl | 0.02080874 | | clip_fraction | 0.179 | | clip_range | 0.2 | | entropy_loss | -42 | | explained_variance | 0.0334 | | learning_rate | 0.00025 | | loss | 5.55 | | n_updates | 190 | | policy_gradient_loss | -0.0186 | | reward | 0.15585361 | | std | 1.03 | | value_loss | 19.1 | ---------------------------------------- ---------------------------------------- | time/ | | | fps | 62 | | iterations | 21 | | time_elapsed | 686 | | total_timesteps | 43008 | | train/ | | | approx_kl | 0.01973752 | | clip_fraction | 0.227 | | clip_range | 0.2 | | entropy_loss | -42 | | explained_variance | 0.00997 | | learning_rate | 0.00025 | | loss | 19 | | n_updates | 200 | | policy_gradient_loss | -0.0153 | | reward | -14.07267 | | std | 1.03 | | value_loss | 75.2 | ---------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 22 | | time_elapsed | 717 | | total_timesteps | 45056 | | train/ | | | approx_kl | 0.013898542 | | clip_fraction | 0.0931 | | clip_range | 0.2 | | entropy_loss | -42 | | explained_variance | -0.000876 | | learning_rate | 0.00025 | | loss | 12.2 | | n_updates | 210 | | policy_gradient_loss | -0.0138 | | reward | -5.085373 | | std | 1.03 | | value_loss | 27 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 23 | | time_elapsed | 750 | | total_timesteps | 47104 | | train/ | | | approx_kl | 0.01667095 | | clip_fraction | 0.185 | | clip_range | 0.2 | | entropy_loss | -42.1 | | explained_variance | 0.00379 | | learning_rate | 0.00025 | | loss | 8.83 | | n_updates | 220 | | policy_gradient_loss | -0.0139 | | reward | -0.11939671 | | std | 1.03 | | value_loss | 30.8 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 24 | | time_elapsed | 785 | | total_timesteps | 49152 | | train/ | | | approx_kl | 0.027711859 | | clip_fraction | 0.253 | | clip_range | 0.2 | | entropy_loss | -42.1 | | explained_variance | 0.0238 | | learning_rate | 0.00025 | | loss | 31.9 | | n_updates | 230 | | policy_gradient_loss | -0.00308 | | reward | -1.080327 | | std | 1.03 | | value_loss | 75 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 62 | | iterations | 25 | | time_elapsed | 817 | | total_timesteps | 51200 | | train/ | | | approx_kl | 0.025901645 | | clip_fraction | 0.278 | | clip_range | 0.2 | | entropy_loss | -42.1 | | explained_variance | 0.0481 | | learning_rate | 0.00025 | | loss | 5.34 | | n_updates | 240 | | policy_gradient_loss | -0.0164 | | reward | 0.08477563 | | std | 1.04 | | value_loss | 13 | ----------------------------------------- {'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'} Using cpu device Logging to results/sac ----------------------------------- | time/ | | | episodes | 4 | | fps | 19 | | time_elapsed | 703 | | total_timesteps | 13412 | | train/ | | | actor_loss | 1.68e+03 | | critic_loss | 1e+04 | | ent_coef | 0.309 | | ent_coef_loss | -0.516 | | learning_rate | 0.0001 | | n_updates | 13311 | | reward | -11.183781 | ----------------------------------- ----------------------------------- | time/ | | | episodes | 8 | | fps | 19 | | time_elapsed | 1410 | | total_timesteps | 26824 | | train/ | | | actor_loss | 677 | | critic_loss | 66.2 | | ent_coef | 0.0855 | | ent_coef_loss | -112 | | learning_rate | 0.0001 | | n_updates | 26723 | | reward | -10.753805 | ----------------------------------- day: 3352, episode: 10 begin_total_asset: 1005927.23 end_total_asset: 5294689.46 total_reward: 4288762.22 total_cost: 37988.65 total_trades: 61507 Sharpe: 0.700 ================================= ---------------------------------- | time/ | | | episodes | 12 | | fps | 18 | | time_elapsed | 2126 | | total_timesteps | 40236 | | train/ | | | actor_loss | 304 | | critic_loss | 20.1 | | ent_coef | 0.0227 | | ent_coef_loss | -145 | | learning_rate | 0.0001 | | n_updates | 40135 | | reward | -9.593834 | ---------------------------------- {'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001} Using cpu device Logging to results/td3 ----------------------------------- | time/ | | | episodes | 4 | | fps | 24 | | time_elapsed | 544 | | total_timesteps | 13412 | | train/ | | | actor_loss | 132 | | critic_loss | 6.19e+03 | | learning_rate | 0.001 | | n_updates | 10059 | | reward | -2.3487854 | ----------------------------------- ----------------------------------- | time/ | | | episodes | 8 | | fps | 21 | | time_elapsed | 1242 | | total_timesteps | 26824 | | train/ | | | actor_loss | 49 | | critic_loss | 584 | | learning_rate | 0.001 | | n_updates | 23471 | | reward | -2.3487854 | ----------------------------------- day: 3352, episode: 10 begin_total_asset: 1012427.98 end_total_asset: 5866237.13 total_reward: 4853809.15 total_cost: 1011.41 total_trades: 53632 Sharpe: 0.831 ================================= ----------------------------------- | time/ | | | episodes | 12 | | fps | 20 | | time_elapsed | 1943 | | total_timesteps | 40236 | | train/ | | | actor_loss | 37.3 | | critic_loss | 101 | | learning_rate | 0.001 | | n_updates | 36883 | | reward | -2.3487854 | ----------------------------------- hit end! hit end! hit end! hit end! hit end! [*********************100%***********************] 1 of 1 completed Shape of DataFrame: (22, 8) i: 3 {'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0007} Using cpu device Logging to results/a2c ------------------------------------- | time/ | | | fps | 46 | | iterations | 100 | | time_elapsed | 10 | | total_timesteps | 500 | | train/ | | | entropy_loss | -41.2 | | explained_variance | -0.471 | | learning_rate | 0.0007 | | n_updates | 99 | | policy_loss | 86.2 | | reward | 1.3343517 | | std | 1 | | value_loss | 5.99 | ------------------------------------- -------------------------------------- | time/ | | | fps | 47 | | iterations | 200 | | time_elapsed | 20 | | total_timesteps | 1000 | | train/ | | | entropy_loss | -41.3 | | explained_variance | -0.271 | | learning_rate | 0.0007 | | n_updates | 199 | | policy_loss | 44.4 | | reward | -1.4969016 | | std | 1 | | value_loss | 1.22 | -------------------------------------- -------------------------------------- | time/ | | | fps | 47 | | iterations | 300 | | time_elapsed | 31 | | total_timesteps | 1500 | | train/ | | | entropy_loss | -41.3 | | explained_variance | 0.0667 | | learning_rate | 0.0007 | | n_updates | 299 | | policy_loss | 141 | | reward | -4.3429856 | | std | 1 | | value_loss | 15.6 | -------------------------------------- -------------------------------------- | time/ | | | fps | 51 | | iterations | 400 | | time_elapsed | 39 | | total_timesteps | 2000 | | train/ | | | entropy_loss | -41.3 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 399 | | policy_loss | 28.9 | | reward | -2.9280229 | | std | 1.01 | | value_loss | 0.941 | -------------------------------------- ------------------------------------ | time/ | | | fps | 54 | | iterations | 500 | | time_elapsed | 46 | | total_timesteps | 2500 | | train/ | | | entropy_loss | -41.2 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 499 | | policy_loss | -356 | | reward | 2.440834 | | std | 1 | | value_loss | 95.7 | ------------------------------------ ------------------------------------- | time/ | | | fps | 52 | | iterations | 600 | | time_elapsed | 56 | | total_timesteps | 3000 | | train/ | | | entropy_loss | -41.3 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 599 | | policy_loss | -187 | | reward | 7.7011724 | | std | 1.01 | | value_loss | 30.6 | ------------------------------------- -------------------------------------- | time/ | | | fps | 55 | | iterations | 700 | | time_elapsed | 63 | | total_timesteps | 3500 | | train/ | | | entropy_loss | -41.4 | | explained_variance | -0.042 | | learning_rate | 0.0007 | | n_updates | 699 | | policy_loss | 23.3 | | reward | -1.0782235 | | std | 1.01 | | value_loss | 0.963 | -------------------------------------- --------------------------------------- | time/ | | | fps | 56 | | iterations | 800 | | time_elapsed | 71 | | total_timesteps | 4000 | | train/ | | | entropy_loss | -41.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 799 | | policy_loss | -258 | | reward | -0.20911986 | | std | 1.01 | | value_loss | 50.4 | --------------------------------------- ------------------------------------- | time/ | | | fps | 55 | | iterations | 900 | | time_elapsed | 81 | | total_timesteps | 4500 | | train/ | | | entropy_loss | -41.4 | | explained_variance | 0.118 | | learning_rate | 0.0007 | | n_updates | 899 | | policy_loss | -66.9 | | reward | 0.8433642 | | std | 1.01 | | value_loss | 2.9 | ------------------------------------- -------------------------------------- | time/ | | | fps | 57 | | iterations | 1000 | | time_elapsed | 87 | | total_timesteps | 5000 | | train/ | | | entropy_loss | -41.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 999 | | policy_loss | 5.19 | | reward | -1.4874439 | | std | 1.01 | | value_loss | 4.01 | -------------------------------------- ------------------------------------- | time/ | | | fps | 57 | | iterations | 1100 | | time_elapsed | 96 | | total_timesteps | 5500 | | train/ | | | entropy_loss | -41.4 | | explained_variance | -0.555 | | learning_rate | 0.0007 | | n_updates | 1099 | | policy_loss | -77.7 | | reward | 1.8939301 | | std | 1.01 | | value_loss | 3.97 | ------------------------------------- ------------------------------------- | time/ | | | fps | 57 | | iterations | 1200 | | time_elapsed | 105 | | total_timesteps | 6000 | | train/ | | | entropy_loss | -41.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 1199 | | policy_loss | 187 | | reward | 2.3026025 | | std | 1.01 | | value_loss | 33.4 | ------------------------------------- ------------------------------------- | time/ | | | fps | 58 | | iterations | 1300 | | time_elapsed | 111 | | total_timesteps | 6500 | | train/ | | | entropy_loss | -41.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 1299 | | policy_loss | 47.1 | | reward | 1.9173757 | | std | 1.01 | | value_loss | 5.89 | ------------------------------------- -------------------------------------- | time/ | | | fps | 54 | | iterations | 1400 | | time_elapsed | 127 | | total_timesteps | 7000 | | train/ | | | entropy_loss | -41.4 | | explained_variance | -0.0644 | | learning_rate | 0.0007 | | n_updates | 1399 | | policy_loss | 32.9 | | reward | -3.0739012 | | std | 1.01 | | value_loss | 1.06 | -------------------------------------- ------------------------------------- | time/ | | | fps | 55 | | iterations | 1500 | | time_elapsed | 134 | | total_timesteps | 7500 | | train/ | | | entropy_loss | -41.4 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 1499 | | policy_loss | 305 | | reward | 2.5744946 | | std | 1.01 | | value_loss | 50.5 | ------------------------------------- -------------------------------------- | time/ | | | fps | 55 | | iterations | 1600 | | time_elapsed | 144 | | total_timesteps | 8000 | | train/ | | | entropy_loss | -41.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 1599 | | policy_loss | -4.88 | | reward | -2.1707737 | | std | 1.01 | | value_loss | 1.28 | -------------------------------------- --------------------------------------- | time/ | | | fps | 55 | | iterations | 1700 | | time_elapsed | 151 | | total_timesteps | 8500 | | train/ | | | entropy_loss | -41.5 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 1699 | | policy_loss | 160 | | reward | -0.81266195 | | std | 1.01 | | value_loss | 16 | --------------------------------------- -------------------------------------- | time/ | | | fps | 56 | | iterations | 1800 | | time_elapsed | 158 | | total_timesteps | 9000 | | train/ | | | entropy_loss | -41.5 | | explained_variance | -0.208 | | learning_rate | 0.0007 | | n_updates | 1799 | | policy_loss | 38.1 | | reward | -1.4904355 | | std | 1.01 | | value_loss | 5.15 | -------------------------------------- --------------------------------------- | time/ | | | fps | 56 | | iterations | 1900 | | time_elapsed | 169 | | total_timesteps | 9500 | | train/ | | | entropy_loss | -41.5 | | explained_variance | 1.79e-07 | | learning_rate | 0.0007 | | n_updates | 1899 | | policy_loss | 282 | | reward | -0.36043915 | | std | 1.01 | | value_loss | 56 | --------------------------------------- --------------------------------------- | time/ | | | fps | 56 | | iterations | 2000 | | time_elapsed | 175 | | total_timesteps | 10000 | | train/ | | | entropy_loss | -41.5 | | explained_variance | -0.0747 | | learning_rate | 0.0007 | | n_updates | 1999 | | policy_loss | -471 | | reward | -0.37017918 | | std | 1.01 | | value_loss | 171 | --------------------------------------- -------------------------------------- | time/ | | | fps | 57 | | iterations | 2100 | | time_elapsed | 183 | | total_timesteps | 10500 | | train/ | | | entropy_loss | -41.5 | | explained_variance | 0.163 | | learning_rate | 0.0007 | | n_updates | 2099 | | policy_loss | 1.74 | | reward | -1.2063048 | | std | 1.01 | | value_loss | 0.28 | -------------------------------------- ------------------------------------- | time/ | | | fps | 56 | | iterations | 2200 | | time_elapsed | 193 | | total_timesteps | 11000 | | train/ | | | entropy_loss | -41.6 | | explained_variance | -0.326 | | learning_rate | 0.0007 | | n_updates | 2199 | | policy_loss | -94 | | reward | 1.8247845 | | std | 1.01 | | value_loss | 5.61 | ------------------------------------- ------------------------------------- | time/ | | | fps | 57 | | iterations | 2300 | | time_elapsed | 199 | | total_timesteps | 11500 | | train/ | | | entropy_loss | -41.6 | | explained_variance | 0.0682 | | learning_rate | 0.0007 | | n_updates | 2299 | | policy_loss | -128 | | reward | 0.4869665 | | std | 1.01 | | value_loss | 15.7 | ------------------------------------- -------------------------------------- | time/ | | | fps | 57 | | iterations | 2400 | | time_elapsed | 208 | | total_timesteps | 12000 | | train/ | | | entropy_loss | -41.5 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 2399 | | policy_loss | -173 | | reward | -0.9164407 | | std | 1.01 | | value_loss | 23.1 | -------------------------------------- ------------------------------------- | time/ | | | fps | 57 | | iterations | 2500 | | time_elapsed | 217 | | total_timesteps | 12500 | | train/ | | | entropy_loss | -41.6 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 2499 | | policy_loss | 86.3 | | reward | 3.2540042 | | std | 1.02 | | value_loss | 5.33 | ------------------------------------- ------------------------------------- | time/ | | | fps | 58 | | iterations | 2600 | | time_elapsed | 223 | | total_timesteps | 13000 | | train/ | | | entropy_loss | -41.6 | | explained_variance | 1.19e-07 | | learning_rate | 0.0007 | | n_updates | 2599 | | policy_loss | 428 | | reward | 2.4169402 | | std | 1.02 | | value_loss | 112 | ------------------------------------- -------------------------------------- | time/ | | | fps | 57 | | iterations | 2700 | | time_elapsed | 233 | | total_timesteps | 13500 | | train/ | | | entropy_loss | -41.6 | | explained_variance | 0.0601 | | learning_rate | 0.0007 | | n_updates | 2699 | | policy_loss | 1.73 | | reward | -1.3785244 | | std | 1.02 | | value_loss | 0.378 | -------------------------------------- -------------------------------------- | time/ | | | fps | 58 | | iterations | 2800 | | time_elapsed | 241 | | total_timesteps | 14000 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 2799 | | policy_loss | 45.3 | | reward | -1.8347946 | | std | 1.02 | | value_loss | 4.25 | -------------------------------------- -------------------------------------- | time/ | | | fps | 58 | | iterations | 2900 | | time_elapsed | 247 | | total_timesteps | 14500 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 2899 | | policy_loss | -49.6 | | reward | 0.13086061 | | std | 1.02 | | value_loss | 1.75 | -------------------------------------- -------------------------------------- | time/ | | | fps | 58 | | iterations | 3000 | | time_elapsed | 258 | | total_timesteps | 15000 | | train/ | | | entropy_loss | -41.7 | | explained_variance | -0.104 | | learning_rate | 0.0007 | | n_updates | 2999 | | policy_loss | -51.1 | | reward | -2.9340496 | | std | 1.02 | | value_loss | 1.92 | -------------------------------------- ------------------------------------- | time/ | | | fps | 58 | | iterations | 3100 | | time_elapsed | 266 | | total_timesteps | 15500 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 3099 | | policy_loss | -96 | | reward | 5.6104155 | | std | 1.02 | | value_loss | 7.44 | ------------------------------------- ------------------------------------ | time/ | | | fps | 58 | | iterations | 3200 | | time_elapsed | 273 | | total_timesteps | 16000 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3199 | | policy_loss | 288 | | reward | 4.10712 | | std | 1.02 | | value_loss | 56 | ------------------------------------ -------------------------------------- | time/ | | | fps | 58 | | iterations | 3300 | | time_elapsed | 283 | | total_timesteps | 16500 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 3299 | | policy_loss | 29.9 | | reward | 0.10846165 | | std | 1.02 | | value_loss | 6.75 | -------------------------------------- --------------------------------------- | time/ | | | fps | 58 | | iterations | 3400 | | time_elapsed | 290 | | total_timesteps | 17000 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3399 | | policy_loss | -128 | | reward | -0.26822066 | | std | 1.02 | | value_loss | 12.3 | --------------------------------------- --------------------------------------- | time/ | | | fps | 58 | | iterations | 3500 | | time_elapsed | 298 | | total_timesteps | 17500 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3499 | | policy_loss | 23.3 | | reward | 0.012110213 | | std | 1.02 | | value_loss | 0.832 | --------------------------------------- -------------------------------------- | time/ | | | fps | 58 | | iterations | 3600 | | time_elapsed | 308 | | total_timesteps | 18000 | | train/ | | | entropy_loss | -41.8 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3599 | | policy_loss | -94.5 | | reward | -0.6443226 | | std | 1.02 | | value_loss | 11.1 | -------------------------------------- ------------------------------------- | time/ | | | fps | 58 | | iterations | 3700 | | time_elapsed | 314 | | total_timesteps | 18500 | | train/ | | | entropy_loss | -41.8 | | explained_variance | 1.19e-07 | | learning_rate | 0.0007 | | n_updates | 3699 | | policy_loss | -16.7 | | reward | 1.8698422 | | std | 1.02 | | value_loss | 0.374 | ------------------------------------- -------------------------------------- | time/ | | | fps | 58 | | iterations | 3800 | | time_elapsed | 323 | | total_timesteps | 19000 | | train/ | | | entropy_loss | -41.8 | | explained_variance | 1.19e-07 | | learning_rate | 0.0007 | | n_updates | 3799 | | policy_loss | 166 | | reward | -1.3664656 | | std | 1.02 | | value_loss | 19.5 | -------------------------------------- -------------------------------------- | time/ | | | fps | 58 | | iterations | 3900 | | time_elapsed | 332 | | total_timesteps | 19500 | | train/ | | | entropy_loss | -41.7 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3899 | | policy_loss | 43.9 | | reward | -1.1592114 | | std | 1.02 | | value_loss | 2.46 | -------------------------------------- ------------------------------------ | time/ | | | fps | 58 | | iterations | 4000 | | time_elapsed | 339 | | total_timesteps | 20000 | | train/ | | | entropy_loss | -41.8 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 3999 | | policy_loss | -31.6 | | reward | 1.018338 | | std | 1.02 | | value_loss | 0.683 | ------------------------------------ -------------------------------------- | time/ | | | fps | 58 | | iterations | 4100 | | time_elapsed | 348 | | total_timesteps | 20500 | | train/ | | | entropy_loss | -41.8 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 4099 | | policy_loss | -21.4 | | reward | 0.26098472 | | std | 1.02 | | value_loss | 0.295 | -------------------------------------- ------------------------------------- | time/ | | | fps | 58 | | iterations | 4200 | | time_elapsed | 356 | | total_timesteps | 21000 | | train/ | | | entropy_loss | -41.8 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 4199 | | policy_loss | 37.3 | | reward | 2.0496662 | | std | 1.02 | | value_loss | 1.24 | ------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 4300 | | time_elapsed | 362 | | total_timesteps | 21500 | | train/ | | | entropy_loss | -41.9 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 4299 | | policy_loss | 21.7 | | reward | 0.5919729 | | std | 1.03 | | value_loss | 0.614 | ------------------------------------- --------------------------------------- | time/ | | | fps | 58 | | iterations | 4400 | | time_elapsed | 373 | | total_timesteps | 22000 | | train/ | | | entropy_loss | -41.9 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 4399 | | policy_loss | -59.5 | | reward | -0.44648832 | | std | 1.03 | | value_loss | 2.35 | --------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 4500 | | time_elapsed | 380 | | total_timesteps | 22500 | | train/ | | | entropy_loss | -41.9 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 4499 | | policy_loss | 75.7 | | reward | -1.7295737 | | std | 1.03 | | value_loss | 5.7 | -------------------------------------- ------------------------------------ | time/ | | | fps | 59 | | iterations | 4600 | | time_elapsed | 387 | | total_timesteps | 23000 | | train/ | | | entropy_loss | -41.9 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 4599 | | policy_loss | -194 | | reward | -2.21535 | | std | 1.03 | | value_loss | 37.3 | ------------------------------------ -------------------------------------- | time/ | | | fps | 58 | | iterations | 4700 | | time_elapsed | 398 | | total_timesteps | 23500 | | train/ | | | entropy_loss | -42 | | explained_variance | -0.0141 | | learning_rate | 0.0007 | | n_updates | 4699 | | policy_loss | -32.7 | | reward | 0.16243774 | | std | 1.03 | | value_loss | 1.75 | -------------------------------------- ------------------------------------ | time/ | | | fps | 59 | | iterations | 4800 | | time_elapsed | 404 | | total_timesteps | 24000 | | train/ | | | entropy_loss | -42 | | explained_variance | 0.168 | | learning_rate | 0.0007 | | n_updates | 4799 | | policy_loss | -61.7 | | reward | 0.961177 | | std | 1.03 | | value_loss | 2.67 | ------------------------------------ ------------------------------------ | time/ | | | fps | 59 | | iterations | 4900 | | time_elapsed | 412 | | total_timesteps | 24500 | | train/ | | | entropy_loss | -42 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 4899 | | policy_loss | -54.1 | | reward | 3.000443 | | std | 1.03 | | value_loss | 2.52 | ------------------------------------ ------------------------------------- | time/ | | | fps | 59 | | iterations | 5000 | | time_elapsed | 422 | | total_timesteps | 25000 | | train/ | | | entropy_loss | -42 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 4999 | | policy_loss | 75.7 | | reward | 0.7883484 | | std | 1.03 | | value_loss | 6.61 | ------------------------------------- --------------------------------------- | time/ | | | fps | 59 | | iterations | 5100 | | time_elapsed | 428 | | total_timesteps | 25500 | | train/ | | | entropy_loss | -42.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 5099 | | policy_loss | 237 | | reward | -0.49083808 | | std | 1.03 | | value_loss | 39.1 | --------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 5200 | | time_elapsed | 437 | | total_timesteps | 26000 | | train/ | | | entropy_loss | -42.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 5199 | | policy_loss | 152 | | reward | 2.7196112 | | std | 1.03 | | value_loss | 16.1 | ------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 5300 | | time_elapsed | 445 | | total_timesteps | 26500 | | train/ | | | entropy_loss | -42.1 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 5299 | | policy_loss | -317 | | reward | 0.59174556 | | std | 1.03 | | value_loss | 63.7 | -------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 5400 | | time_elapsed | 452 | | total_timesteps | 27000 | | train/ | | | entropy_loss | -42.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 5399 | | policy_loss | -126 | | reward | 0.06384493 | | std | 1.03 | | value_loss | 9.43 | -------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 5500 | | time_elapsed | 461 | | total_timesteps | 27500 | | train/ | | | entropy_loss | -42.1 | | explained_variance | 1.19e-07 | | learning_rate | 0.0007 | | n_updates | 5499 | | policy_loss | -11.3 | | reward | -1.1629822 | | std | 1.03 | | value_loss | 0.213 | -------------------------------------- ------------------------------------ | time/ | | | fps | 59 | | iterations | 5600 | | time_elapsed | 469 | | total_timesteps | 28000 | | train/ | | | entropy_loss | -42.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 5599 | | policy_loss | 91 | | reward | 1.35537 | | std | 1.03 | | value_loss | 5.83 | ------------------------------------ ------------------------------------- | time/ | | | fps | 59 | | iterations | 5700 | | time_elapsed | 476 | | total_timesteps | 28500 | | train/ | | | entropy_loss | -42.1 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 5699 | | policy_loss | -18.6 | | reward | -2.177703 | | std | 1.03 | | value_loss | 0.358 | ------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 5800 | | time_elapsed | 487 | | total_timesteps | 29000 | | train/ | | | entropy_loss | -42 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 5799 | | policy_loss | -36.6 | | reward | -2.1937134 | | std | 1.03 | | value_loss | 2.54 | -------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 5900 | | time_elapsed | 497 | | total_timesteps | 29500 | | train/ | | | entropy_loss | -42 | | explained_variance | 1.19e-07 | | learning_rate | 0.0007 | | n_updates | 5899 | | policy_loss | -94.3 | | reward | -1.7350562 | | std | 1.03 | | value_loss | 7.48 | -------------------------------------- day: 3330, episode: 10 begin_total_asset: 952508.66 end_total_asset: 4088694.53 total_reward: 3136185.87 total_cost: 3157.22 total_trades: 58734 Sharpe: 0.733 ================================= ------------------------------------- | time/ | | | fps | 59 | | iterations | 6000 | | time_elapsed | 507 | | total_timesteps | 30000 | | train/ | | | entropy_loss | -42 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 5999 | | policy_loss | 9.33 | | reward | 1.7072018 | | std | 1.03 | | value_loss | 0.168 | ------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 6100 | | time_elapsed | 515 | | total_timesteps | 30500 | | train/ | | | entropy_loss | -42 | | explained_variance | 0.137 | | learning_rate | 0.0007 | | n_updates | 6099 | | policy_loss | 86.1 | | reward | 0.23781453 | | std | 1.03 | | value_loss | 5.84 | -------------------------------------- --------------------------------------- | time/ | | | fps | 59 | | iterations | 6200 | | time_elapsed | 522 | | total_timesteps | 31000 | | train/ | | | entropy_loss | -42 | | explained_variance | 1.19e-07 | | learning_rate | 0.0007 | | n_updates | 6199 | | policy_loss | 81.6 | | reward | -0.55448675 | | std | 1.03 | | value_loss | 4.51 | --------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 6300 | | time_elapsed | 532 | | total_timesteps | 31500 | | train/ | | | entropy_loss | -42.1 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 6299 | | policy_loss | -123 | | reward | 0.53070265 | | std | 1.03 | | value_loss | 10.1 | -------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 6400 | | time_elapsed | 539 | | total_timesteps | 32000 | | train/ | | | entropy_loss | -42.2 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 6399 | | policy_loss | -35.1 | | reward | -0.7190698 | | std | 1.04 | | value_loss | 0.746 | -------------------------------------- --------------------------------------- | time/ | | | fps | 59 | | iterations | 6500 | | time_elapsed | 547 | | total_timesteps | 32500 | | train/ | | | entropy_loss | -42.1 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 6499 | | policy_loss | -195 | | reward | -0.20805828 | | std | 1.04 | | value_loss | 24.3 | --------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 6600 | | time_elapsed | 557 | | total_timesteps | 33000 | | train/ | | | entropy_loss | -42.1 | | explained_variance | 0.0285 | | learning_rate | 0.0007 | | n_updates | 6599 | | policy_loss | -113 | | reward | -2.668644 | | std | 1.04 | | value_loss | 13.9 | ------------------------------------- ---------------------------------------- | time/ | | | fps | 59 | | iterations | 6700 | | time_elapsed | 563 | | total_timesteps | 33500 | | train/ | | | entropy_loss | -42.2 | | explained_variance | -0.603 | | learning_rate | 0.0007 | | n_updates | 6699 | | policy_loss | -39.6 | | reward | -0.083356254 | | std | 1.04 | | value_loss | 0.818 | ---------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 6800 | | time_elapsed | 572 | | total_timesteps | 34000 | | train/ | | | entropy_loss | -42.2 | | explained_variance | 0.0184 | | learning_rate | 0.0007 | | n_updates | 6799 | | policy_loss | 86.5 | | reward | 0.6618178 | | std | 1.04 | | value_loss | 5.35 | ------------------------------------- --------------------------------------- | time/ | | | fps | 59 | | iterations | 6900 | | time_elapsed | 581 | | total_timesteps | 34500 | | train/ | | | entropy_loss | -42.2 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 6899 | | policy_loss | 56.7 | | reward | 0.052872755 | | std | 1.04 | | value_loss | 2.85 | --------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 7000 | | time_elapsed | 587 | | total_timesteps | 35000 | | train/ | | | entropy_loss | -42.3 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 6999 | | policy_loss | 197 | | reward | 1.6442178 | | std | 1.04 | | value_loss | 26.3 | ------------------------------------- --------------------------------------- | time/ | | | fps | 59 | | iterations | 7100 | | time_elapsed | 597 | | total_timesteps | 35500 | | train/ | | | entropy_loss | -42.3 | | explained_variance | -0.0238 | | learning_rate | 0.0007 | | n_updates | 7099 | | policy_loss | 39.7 | | reward | -0.16224274 | | std | 1.04 | | value_loss | 1.43 | --------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 7200 | | time_elapsed | 605 | | total_timesteps | 36000 | | train/ | | | entropy_loss | -42.3 | | explained_variance | 0.03 | | learning_rate | 0.0007 | | n_updates | 7199 | | policy_loss | 139 | | reward | -0.1674491 | | std | 1.04 | | value_loss | 11.7 | -------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 7300 | | time_elapsed | 611 | | total_timesteps | 36500 | | train/ | | | entropy_loss | -42.4 | | explained_variance | -0.0288 | | learning_rate | 0.0007 | | n_updates | 7299 | | policy_loss | -406 | | reward | 2.2645469 | | std | 1.04 | | value_loss | 134 | ------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 7400 | | time_elapsed | 622 | | total_timesteps | 37000 | | train/ | | | entropy_loss | -42.4 | | explained_variance | 0.0351 | | learning_rate | 0.0007 | | n_updates | 7399 | | policy_loss | 73.6 | | reward | 0.30078474 | | std | 1.04 | | value_loss | 3.6 | -------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 7500 | | time_elapsed | 629 | | total_timesteps | 37500 | | train/ | | | entropy_loss | -42.3 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 7499 | | policy_loss | -8.08 | | reward | -0.3665664 | | std | 1.04 | | value_loss | 0.214 | -------------------------------------- --------------------------------------- | time/ | | | fps | 59 | | iterations | 7600 | | time_elapsed | 636 | | total_timesteps | 38000 | | train/ | | | entropy_loss | -42.4 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 7599 | | policy_loss | 42.9 | | reward | -0.79383886 | | std | 1.04 | | value_loss | 1.9 | --------------------------------------- ---------------------------------------- | time/ | | | fps | 59 | | iterations | 7700 | | time_elapsed | 647 | | total_timesteps | 38500 | | train/ | | | entropy_loss | -42.4 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 7699 | | policy_loss | -104 | | reward | -0.073217735 | | std | 1.05 | | value_loss | 10.3 | ---------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 7800 | | time_elapsed | 653 | | total_timesteps | 39000 | | train/ | | | entropy_loss | -42.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 7799 | | policy_loss | -152 | | reward | -1.8329335 | | std | 1.05 | | value_loss | 16.7 | -------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 7900 | | time_elapsed | 661 | | total_timesteps | 39500 | | train/ | | | entropy_loss | -42.4 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 7899 | | policy_loss | 144 | | reward | -0.8008484 | | std | 1.05 | | value_loss | 15.8 | -------------------------------------- ---------------------------------------- | time/ | | | fps | 59 | | iterations | 8000 | | time_elapsed | 671 | | total_timesteps | 40000 | | train/ | | | entropy_loss | -42.3 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 7999 | | policy_loss | -8.53 | | reward | -0.031915538 | | std | 1.04 | | value_loss | 0.0835 | ---------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 8100 | | time_elapsed | 677 | | total_timesteps | 40500 | | train/ | | | entropy_loss | -42.4 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 8099 | | policy_loss | -69.3 | | reward | 0.8095603 | | std | 1.04 | | value_loss | 3.08 | ------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 8200 | | time_elapsed | 686 | | total_timesteps | 41000 | | train/ | | | entropy_loss | -42.3 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 8199 | | policy_loss | -5.2 | | reward | -0.5655167 | | std | 1.04 | | value_loss | 0.69 | -------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 8300 | | time_elapsed | 695 | | total_timesteps | 41500 | | train/ | | | entropy_loss | -42.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 8299 | | policy_loss | -29.8 | | reward | -1.5929188 | | std | 1.04 | | value_loss | 0.672 | -------------------------------------- --------------------------------------- | time/ | | | fps | 59 | | iterations | 8400 | | time_elapsed | 701 | | total_timesteps | 42000 | | train/ | | | entropy_loss | -42.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 8399 | | policy_loss | -54.2 | | reward | -0.53150016 | | std | 1.05 | | value_loss | 9.5 | --------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 8500 | | time_elapsed | 711 | | total_timesteps | 42500 | | train/ | | | entropy_loss | -42.5 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 8499 | | policy_loss | 237 | | reward | 2.7706447 | | std | 1.05 | | value_loss | 42.1 | ------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 8600 | | time_elapsed | 719 | | total_timesteps | 43000 | | train/ | | | entropy_loss | -42.5 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 8599 | | policy_loss | -188 | | reward | 1.1153419 | | std | 1.05 | | value_loss | 21.7 | ------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 8700 | | time_elapsed | 730 | | total_timesteps | 43500 | | train/ | | | entropy_loss | -42.4 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 8699 | | policy_loss | -17.4 | | reward | -0.5148427 | | std | 1.05 | | value_loss | 0.297 | -------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 8800 | | time_elapsed | 740 | | total_timesteps | 44000 | | train/ | | | entropy_loss | -42.4 | | explained_variance | 1.19e-07 | | learning_rate | 0.0007 | | n_updates | 8799 | | policy_loss | 41.4 | | reward | 0.32814896 | | std | 1.05 | | value_loss | 1.6 | -------------------------------------- --------------------------------------- | time/ | | | fps | 59 | | iterations | 8900 | | time_elapsed | 746 | | total_timesteps | 44500 | | train/ | | | entropy_loss | -42.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 8899 | | policy_loss | -47.2 | | reward | -0.17413093 | | std | 1.05 | | value_loss | 1.75 | --------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 9000 | | time_elapsed | 755 | | total_timesteps | 45000 | | train/ | | | entropy_loss | -42.4 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 8999 | | policy_loss | 65.1 | | reward | 0.38266626 | | std | 1.05 | | value_loss | 6.64 | -------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 9100 | | time_elapsed | 764 | | total_timesteps | 45500 | | train/ | | | entropy_loss | -42.4 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 9099 | | policy_loss | 31.2 | | reward | 1.3317974 | | std | 1.05 | | value_loss | 0.927 | ------------------------------------- --------------------------------------- | time/ | | | fps | 59 | | iterations | 9200 | | time_elapsed | 770 | | total_timesteps | 46000 | | train/ | | | entropy_loss | -42.4 | | explained_variance | -0.0927 | | learning_rate | 0.0007 | | n_updates | 9199 | | policy_loss | 181 | | reward | -0.49035767 | | std | 1.05 | | value_loss | 21.3 | --------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 9300 | | time_elapsed | 780 | | total_timesteps | 46500 | | train/ | | | entropy_loss | -42.5 | | explained_variance | 1.19e-07 | | learning_rate | 0.0007 | | n_updates | 9299 | | policy_loss | 148 | | reward | -8.756936 | | std | 1.05 | | value_loss | 30.5 | ------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 9400 | | time_elapsed | 788 | | total_timesteps | 47000 | | train/ | | | entropy_loss | -42.5 | | explained_variance | -0.066 | | learning_rate | 0.0007 | | n_updates | 9399 | | policy_loss | 40.5 | | reward | 0.5117786 | | std | 1.05 | | value_loss | 1.51 | ------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 9500 | | time_elapsed | 794 | | total_timesteps | 47500 | | train/ | | | entropy_loss | -42.5 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 9499 | | policy_loss | 46.4 | | reward | 1.5631902 | | std | 1.05 | | value_loss | 1.37 | ------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 9600 | | time_elapsed | 804 | | total_timesteps | 48000 | | train/ | | | entropy_loss | -42.5 | | explained_variance | 5.96e-08 | | learning_rate | 0.0007 | | n_updates | 9599 | | policy_loss | 4.73 | | reward | -0.8106855 | | std | 1.05 | | value_loss | 0.346 | -------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 9700 | | time_elapsed | 811 | | total_timesteps | 48500 | | train/ | | | entropy_loss | -42.5 | | explained_variance | -1.19e-07 | | learning_rate | 0.0007 | | n_updates | 9699 | | policy_loss | 60.8 | | reward | 1.219504 | | std | 1.05 | | value_loss | 3.44 | ------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 9800 | | time_elapsed | 818 | | total_timesteps | 49000 | | train/ | | | entropy_loss | -42.5 | | explained_variance | 0.00147 | | learning_rate | 0.0007 | | n_updates | 9799 | | policy_loss | -19 | | reward | 0.36547118 | | std | 1.05 | | value_loss | 6.68 | -------------------------------------- ------------------------------------- | time/ | | | fps | 59 | | iterations | 9900 | | time_elapsed | 829 | | total_timesteps | 49500 | | train/ | | | entropy_loss | -42.5 | | explained_variance | -0.0611 | | learning_rate | 0.0007 | | n_updates | 9899 | | policy_loss | -14 | | reward | 1.2229353 | | std | 1.05 | | value_loss | 2.29 | ------------------------------------- -------------------------------------- | time/ | | | fps | 59 | | iterations | 10000 | | time_elapsed | 835 | | total_timesteps | 50000 | | train/ | | | entropy_loss | -42.6 | | explained_variance | 0 | | learning_rate | 0.0007 | | n_updates | 9999 | | policy_loss | -15.6 | | reward | 0.31784078 | | std | 1.05 | | value_loss | 0.296 | -------------------------------------- {'batch_size': 128, 'buffer_size': 50000, 'learning_rate': 0.001} Using cpu device Logging to results/ddpg ----------------------------------- | time/ | | | episodes | 4 | | fps | 23 | | time_elapsed | 556 | | total_timesteps | 13324 | | train/ | | | actor_loss | 20.3 | | critic_loss | 66.8 | | learning_rate | 0.001 | | n_updates | 9993 | | reward | -4.5011277 | ----------------------------------- ----------------------------------- | time/ | | | episodes | 8 | | fps | 21 | | time_elapsed | 1250 | | total_timesteps | 26648 | | train/ | | | actor_loss | 2.62 | | critic_loss | 9.79 | | learning_rate | 0.001 | | n_updates | 23317 | | reward | -4.5011277 | ----------------------------------- day: 3330, episode: 10 begin_total_asset: 965326.95 end_total_asset: 3940368.63 total_reward: 2975041.68 total_cost: 964.36 total_trades: 53280 Sharpe: 0.657 ================================= ----------------------------------- | time/ | | | episodes | 12 | | fps | 20 | | time_elapsed | 1944 | | total_timesteps | 39972 | | train/ | | | actor_loss | -3.63 | | critic_loss | 2.39 | | learning_rate | 0.001 | | n_updates | 36641 | | reward | -4.5011277 | ----------------------------------- ----------------------------------- | time/ | | | episodes | 16 | | fps | 20 | | time_elapsed | 2656 | | total_timesteps | 53296 | | train/ | | | actor_loss | -6.92 | | critic_loss | 1.51 | | learning_rate | 0.001 | | n_updates | 49965 | | reward | -4.5011277 | ----------------------------------- {'n_steps': 2048, 'ent_coef': 0.01, 'learning_rate': 0.00025, 'batch_size': 128} Using cpu device Logging to results/ppo ----------------------------------- | time/ | | | fps | 70 | | iterations | 1 | | time_elapsed | 29 | | total_timesteps | 2048 | | train/ | | | reward | -0.3290882 | ----------------------------------- ----------------------------------------- | time/ | | | fps | 67 | | iterations | 2 | | time_elapsed | 60 | | total_timesteps | 4096 | | train/ | | | approx_kl | 0.019916927 | | clip_fraction | 0.207 | | clip_range | 0.2 | | entropy_loss | -41.2 | | explained_variance | -0.00611 | | learning_rate | 0.00025 | | loss | 6.42 | | n_updates | 10 | | policy_gradient_loss | -0.0268 | | reward | 0.84259444 | | std | 1 | | value_loss | 15 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 65 | | iterations | 3 | | time_elapsed | 93 | | total_timesteps | 6144 | | train/ | | | approx_kl | 0.016416349 | | clip_fraction | 0.211 | | clip_range | 0.2 | | entropy_loss | -41.3 | | explained_variance | 0.00243 | | learning_rate | 0.00025 | | loss | 71.4 | | n_updates | 20 | | policy_gradient_loss | -0.0189 | | reward | -22.102169 | | std | 1.01 | | value_loss | 95.2 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 65 | | iterations | 4 | | time_elapsed | 125 | | total_timesteps | 8192 | | train/ | | | approx_kl | 0.016711425 | | clip_fraction | 0.152 | | clip_range | 0.2 | | entropy_loss | -41.3 | | explained_variance | -0.0235 | | learning_rate | 0.00025 | | loss | 19.2 | | n_updates | 30 | | policy_gradient_loss | -0.0181 | | reward | 0.8641611 | | std | 1.01 | | value_loss | 51 | ----------------------------------------- ---------------------------------------- | time/ | | | fps | 64 | | iterations | 5 | | time_elapsed | 158 | | total_timesteps | 10240 | | train/ | | | approx_kl | 0.02179965 | | clip_fraction | 0.258 | | clip_range | 0.2 | | entropy_loss | -41.3 | | explained_variance | -0.00376 | | learning_rate | 0.00025 | | loss | 24.8 | | n_updates | 40 | | policy_gradient_loss | -0.0161 | | reward | 0.7124557 | | std | 1.01 | | value_loss | 37.7 | ---------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 6 | | time_elapsed | 189 | | total_timesteps | 12288 | | train/ | | | approx_kl | 0.020254686 | | clip_fraction | 0.206 | | clip_range | 0.2 | | entropy_loss | -41.4 | | explained_variance | -0.02 | | learning_rate | 0.00025 | | loss | 15.9 | | n_updates | 50 | | policy_gradient_loss | -0.0192 | | reward | 2.9676142 | | std | 1.01 | | value_loss | 56 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 7 | | time_elapsed | 221 | | total_timesteps | 14336 | | train/ | | | approx_kl | 0.015349641 | | clip_fraction | 0.182 | | clip_range | 0.2 | | entropy_loss | -41.5 | | explained_variance | 0.00714 | | learning_rate | 0.00025 | | loss | 7.18 | | n_updates | 60 | | policy_gradient_loss | -0.0222 | | reward | -1.0227845 | | std | 1.01 | | value_loss | 12.5 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 8 | | time_elapsed | 254 | | total_timesteps | 16384 | | train/ | | | approx_kl | 0.020761559 | | clip_fraction | 0.231 | | clip_range | 0.2 | | entropy_loss | -41.5 | | explained_variance | -0.00857 | | learning_rate | 0.00025 | | loss | 25.2 | | n_updates | 70 | | policy_gradient_loss | -0.0199 | | reward | 0.80425155 | | std | 1.01 | | value_loss | 57.8 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 9 | | time_elapsed | 283 | | total_timesteps | 18432 | | train/ | | | approx_kl | 0.018122694 | | clip_fraction | 0.236 | | clip_range | 0.2 | | entropy_loss | -41.5 | | explained_variance | 0.00296 | | learning_rate | 0.00025 | | loss | 28.1 | | n_updates | 80 | | policy_gradient_loss | -0.0166 | | reward | -1.42386 | | std | 1.01 | | value_loss | 57.8 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 10 | | time_elapsed | 318 | | total_timesteps | 20480 | | train/ | | | approx_kl | 0.022673171 | | clip_fraction | 0.205 | | clip_range | 0.2 | | entropy_loss | -41.6 | | explained_variance | -0.013 | | learning_rate | 0.00025 | | loss | 17.3 | | n_updates | 90 | | policy_gradient_loss | -0.0191 | | reward | 0.8197509 | | std | 1.02 | | value_loss | 44.3 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 11 | | time_elapsed | 352 | | total_timesteps | 22528 | | train/ | | | approx_kl | 0.020850785 | | clip_fraction | 0.214 | | clip_range | 0.2 | | entropy_loss | -41.6 | | explained_variance | -0.00669 | | learning_rate | 0.00025 | | loss | 48.4 | | n_updates | 100 | | policy_gradient_loss | -0.0161 | | reward | 1.2033767 | | std | 1.02 | | value_loss | 99.1 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 12 | | time_elapsed | 384 | | total_timesteps | 24576 | | train/ | | | approx_kl | 0.024814304 | | clip_fraction | 0.251 | | clip_range | 0.2 | | entropy_loss | -41.6 | | explained_variance | -0.0225 | | learning_rate | 0.00025 | | loss | 10.8 | | n_updates | 110 | | policy_gradient_loss | -0.018 | | reward | 1.610058 | | std | 1.02 | | value_loss | 22.5 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 13 | | time_elapsed | 416 | | total_timesteps | 26624 | | train/ | | | approx_kl | 0.017855735 | | clip_fraction | 0.173 | | clip_range | 0.2 | | entropy_loss | -41.7 | | explained_variance | 0.00501 | | learning_rate | 0.00025 | | loss | 34.5 | | n_updates | 120 | | policy_gradient_loss | -0.0189 | | reward | 7.162905 | | std | 1.02 | | value_loss | 112 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 14 | | time_elapsed | 446 | | total_timesteps | 28672 | | train/ | | | approx_kl | 0.018644353 | | clip_fraction | 0.153 | | clip_range | 0.2 | | entropy_loss | -41.7 | | explained_variance | 0.0117 | | learning_rate | 0.00025 | | loss | 16.7 | | n_updates | 130 | | policy_gradient_loss | -0.0172 | | reward | 2.0473788 | | std | 1.02 | | value_loss | 53.3 | ----------------------------------------- day: 3330, episode: 10 begin_total_asset: 994554.41 end_total_asset: 4699503.39 total_reward: 3704948.98 total_cost: 439274.68 total_trades: 90096 Sharpe: 0.806 ================================= ----------------------------------------- | time/ | | | fps | 63 | | iterations | 15 | | time_elapsed | 480 | | total_timesteps | 30720 | | train/ | | | approx_kl | 0.02508668 | | clip_fraction | 0.25 | | clip_range | 0.2 | | entropy_loss | -41.8 | | explained_variance | -0.0505 | | learning_rate | 0.00025 | | loss | 4.42 | | n_updates | 140 | | policy_gradient_loss | -0.0173 | | reward | -0.36127353 | | std | 1.02 | | value_loss | 14.8 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 16 | | time_elapsed | 510 | | total_timesteps | 32768 | | train/ | | | approx_kl | 0.021448491 | | clip_fraction | 0.211 | | clip_range | 0.2 | | entropy_loss | -41.8 | | explained_variance | 0.00132 | | learning_rate | 0.00025 | | loss | 38 | | n_updates | 150 | | policy_gradient_loss | -0.00894 | | reward | -2.4289682 | | std | 1.02 | | value_loss | 88 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 17 | | time_elapsed | 542 | | total_timesteps | 34816 | | train/ | | | approx_kl | 0.02103462 | | clip_fraction | 0.208 | | clip_range | 0.2 | | entropy_loss | -41.8 | | explained_variance | -0.0246 | | learning_rate | 0.00025 | | loss | 35.3 | | n_updates | 160 | | policy_gradient_loss | -0.0134 | | reward | -0.71985894 | | std | 1.02 | | value_loss | 54.5 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 18 | | time_elapsed | 577 | | total_timesteps | 36864 | | train/ | | | approx_kl | 0.022089712 | | clip_fraction | 0.213 | | clip_range | 0.2 | | entropy_loss | -41.9 | | explained_variance | -0.0028 | | learning_rate | 0.00025 | | loss | 27.9 | | n_updates | 170 | | policy_gradient_loss | -0.0207 | | reward | 0.11034006 | | std | 1.03 | | value_loss | 39.3 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 19 | | time_elapsed | 609 | | total_timesteps | 38912 | | train/ | | | approx_kl | 0.014264661 | | clip_fraction | 0.126 | | clip_range | 0.2 | | entropy_loss | -41.9 | | explained_variance | -0.00283 | | learning_rate | 0.00025 | | loss | 58.6 | | n_updates | 180 | | policy_gradient_loss | -0.0135 | | reward | 6.176509 | | std | 1.03 | | value_loss | 119 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 20 | | time_elapsed | 642 | | total_timesteps | 40960 | | train/ | | | approx_kl | 0.027180977 | | clip_fraction | 0.292 | | clip_range | 0.2 | | entropy_loss | -42 | | explained_variance | 0.0421 | | learning_rate | 0.00025 | | loss | 8.9 | | n_updates | 190 | | policy_gradient_loss | -0.0156 | | reward | 0.20096779 | | std | 1.03 | | value_loss | 19.8 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 21 | | time_elapsed | 671 | | total_timesteps | 43008 | | train/ | | | approx_kl | 0.021884244 | | clip_fraction | 0.205 | | clip_range | 0.2 | | entropy_loss | -42 | | explained_variance | -0.00219 | | learning_rate | 0.00025 | | loss | 53.4 | | n_updates | 200 | | policy_gradient_loss | -0.0145 | | reward | -0.839949 | | std | 1.03 | | value_loss | 94.3 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 22 | | time_elapsed | 706 | | total_timesteps | 45056 | | train/ | | | approx_kl | 0.024635753 | | clip_fraction | 0.235 | | clip_range | 0.2 | | entropy_loss | -42.1 | | explained_variance | -0.00329 | | learning_rate | 0.00025 | | loss | 27.6 | | n_updates | 210 | | policy_gradient_loss | -0.0148 | | reward | -0.21918707 | | std | 1.04 | | value_loss | 61.8 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 64 | | iterations | 23 | | time_elapsed | 735 | | total_timesteps | 47104 | | train/ | | | approx_kl | 0.038902897 | | clip_fraction | 0.28 | | clip_range | 0.2 | | entropy_loss | -42.2 | | explained_variance | -0.0241 | | learning_rate | 0.00025 | | loss | 21.9 | | n_updates | 220 | | policy_gradient_loss | -0.0178 | | reward | -0.12725857 | | std | 1.04 | | value_loss | 34.3 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 24 | | time_elapsed | 768 | | total_timesteps | 49152 | | train/ | | | approx_kl | 0.017998032 | | clip_fraction | 0.174 | | clip_range | 0.2 | | entropy_loss | -42.2 | | explained_variance | 0.0111 | | learning_rate | 0.00025 | | loss | 27.9 | | n_updates | 230 | | policy_gradient_loss | -0.0148 | | reward | 1.7231001 | | std | 1.04 | | value_loss | 65.2 | ----------------------------------------- ----------------------------------------- | time/ | | | fps | 63 | | iterations | 25 | | time_elapsed | 804 | | total_timesteps | 51200 | | train/ | | | approx_kl | 0.017844416 | | clip_fraction | 0.186 | | clip_range | 0.2 | | entropy_loss | -42.3 | | explained_variance | 0.0211 | | learning_rate | 0.00025 | | loss | 13.5 | | n_updates | 240 | | policy_gradient_loss | -0.0149 | | reward | -1.0208522 | | std | 1.04 | | value_loss | 35.2 | ----------------------------------------- {'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'} Using cpu device Logging to results/sac ----------------------------------- | time/ | | | episodes | 4 | | fps | 18 | | time_elapsed | 704 | | total_timesteps | 13324 | | train/ | | | actor_loss | 1.1e+03 | | critic_loss | 642 | | ent_coef | 0.169 | | ent_coef_loss | -83.1 | | learning_rate | 0.0001 | | n_updates | 13223 | | reward | -4.2128644 | ----------------------------------- ----------------------------------- | time/ | | | episodes | 8 | | fps | 18 | | time_elapsed | 1433 | | total_timesteps | 26648 | | train/ | | | actor_loss | 451 | | critic_loss | 27.5 | | ent_coef | 0.046 | | ent_coef_loss | -109 | | learning_rate | 0.0001 | | n_updates | 26547 | | reward | -4.2404695 | ----------------------------------- day: 3330, episode: 10 begin_total_asset: 953106.81 end_total_asset: 7458866.64 total_reward: 6505759.83 total_cost: 8648.15 total_trades: 59083 Sharpe: 0.842 ================================= ---------------------------------- | time/ | | | episodes | 12 | | fps | 18 | | time_elapsed | 2152 | | total_timesteps | 39972 | | train/ | | | actor_loss | 216 | | critic_loss | 38.6 | | ent_coef | 0.0127 | | ent_coef_loss | -102 | | learning_rate | 0.0001 | | n_updates | 39871 | | reward | -3.931381 | ---------------------------------- {'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001} Using cpu device Logging to results/td3 ----------------------------------- | time/ | | | episodes | 4 | | fps | 25 | | time_elapsed | 526 | | total_timesteps | 13324 | | train/ | | | actor_loss | 91.6 | | critic_loss | 1.45e+03 | | learning_rate | 0.001 | | n_updates | 9993 | | reward | -3.5290053 | ----------------------------------- ----------------------------------- | time/ | | | episodes | 8 | | fps | 22 | | time_elapsed | 1191 | | total_timesteps | 26648 | | train/ | | | actor_loss | 43.8 | | critic_loss | 317 | | learning_rate | 0.001 | | n_updates | 23317 | | reward | -3.5290053 | ----------------------------------- day: 3330, episode: 10 begin_total_asset: 972865.93 end_total_asset: 3563567.55 total_reward: 2590701.62 total_cost: 971.89 total_trades: 46620 Sharpe: 0.648 ================================= ----------------------------------- | time/ | | | episodes | 12 | | fps | 21 | | time_elapsed | 1862 | | total_timesteps | 39972 | | train/ | | | actor_loss | 34.3 | | critic_loss | 54.4 | | learning_rate | 0.001 | | n_updates | 36641 | | reward | -3.5290053 | -----------------------------------