Path: blob/master/examples/Stock_NeurIPS2018_call_func_SB3.ipynb
726 views
Deep Reinforcement Learning for Stock Trading from Scratch: Multiple Stock Trading
Pytorch Version
Content
We train a DRL agent for stock trading. This task is modeled as a Markov Decision Process (MDP), and the objective function is maximizing (expected) cumulative return.
We specify the state-action-reward as follows:
State s: The state space represents an agent's perception of the market environment. Just like a human trader analyzing various information, here our agent passively observes many features and learns by interacting with the market environment (usually by replaying historical data).
Action a: The action space includes allowed actions that an agent can take at each state. For example, a ∈ {−1, 0, 1}, where −1, 0, 1 represent selling, holding, and buying. When an action operates multiple shares, a ∈{−k, ..., −1, 0, 1, ..., k}, e.g.. "Buy 10 shares of AAPL" or "Sell 10 shares of AAPL" are 10 or −10, respectively
Reward function r(s, a, s′): Reward is an incentive for an agent to learn a better policy. For example, it can be the change of the portfolio value when taking a at state s and arriving at new state s', i.e., r(s, a, s′) = v′ − v, where v′ and v represent the portfolio values at state s′ and s, respectively
Market environment: 30 consituent stocks of Dow Jones Industrial Average (DJIA) index. Accessed at the starting date of the testing period.
The data for this case study is obtained from Yahoo Finance API. The data contains Open-High-Low-Close price and volume.
流式输出内容被截断,只能显示最后 5000 行内容。
| std | 1.02 |
| value_loss | 52.5 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 10 |
| time_elapsed | 316 |
| total_timesteps | 20480 |
| train/ | |
| approx_kl | 0.018726377 |
| clip_fraction | 0.228 |
| clip_range | 0.2 |
| entropy_loss | -41.6 |
| explained_variance | -0.00599 |
| learning_rate | 0.00025 |
| loss | 10 |
| n_updates | 90 |
| policy_gradient_loss | -0.0229 |
| reward | 1.8604985 |
| std | 1.02 |
| value_loss | 34 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 11 |
| time_elapsed | 350 |
| total_timesteps | 22528 |
| train/ | |
| approx_kl | 0.017771121 |
| clip_fraction | 0.201 |
| clip_range | 0.2 |
| entropy_loss | -41.7 |
| explained_variance | -0.00452 |
| learning_rate | 0.00025 |
| loss | 102 |
| n_updates | 100 |
| policy_gradient_loss | -0.0176 |
| reward | 2.4363315 |
| std | 1.02 |
| value_loss | 257 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 12 |
| time_elapsed | 380 |
| total_timesteps | 24576 |
| train/ | |
| approx_kl | 0.021592125 |
| clip_fraction | 0.24 |
| clip_range | 0.2 |
| entropy_loss | -41.7 |
| explained_variance | -0.00462 |
| learning_rate | 0.00025 |
| loss | 13.1 |
| n_updates | 110 |
| policy_gradient_loss | -0.0218 |
| reward | -0.36686477 |
| std | 1.02 |
| value_loss | 27.8 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 13 |
| time_elapsed | 417 |
| total_timesteps | 26624 |
| train/ | |
| approx_kl | 0.016095877 |
| clip_fraction | 0.171 |
| clip_range | 0.2 |
| entropy_loss | -41.8 |
| explained_variance | 0.00607 |
| learning_rate | 0.00025 |
| loss | 63.8 |
| n_updates | 120 |
| policy_gradient_loss | -0.0175 |
| reward | -6.2590113 |
| std | 1.02 |
| value_loss | 161 |
-----------------------------------------
----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 14 |
| time_elapsed | 447 |
| total_timesteps | 28672 |
| train/ | |
| approx_kl | 0.02099569 |
| clip_fraction | 0.204 |
| clip_range | 0.2 |
| entropy_loss | -41.9 |
| explained_variance | 0.00587 |
| learning_rate | 0.00025 |
| loss | 18.1 |
| n_updates | 130 |
| policy_gradient_loss | -0.0176 |
| reward | -1.5635415 |
| std | 1.03 |
| value_loss | 76.1 |
----------------------------------------
day: 3374, episode: 10
begin_total_asset: 1017321.61
end_total_asset: 4690150.25
total_reward: 3672828.63
total_cost: 440655.06
total_trades: 91574
Sharpe: 0.777
=================================
------------------------------------------
| time/ | |
| fps | 63 |
| iterations | 15 |
| time_elapsed | 480 |
| total_timesteps | 30720 |
| train/ | |
| approx_kl | 0.01574407 |
| clip_fraction | 0.252 |
| clip_range | 0.2 |
| entropy_loss | -41.9 |
| explained_variance | 0.045 |
| learning_rate | 0.00025 |
| loss | 8.21 |
| n_updates | 140 |
| policy_gradient_loss | -0.0207 |
| reward | -0.058135245 |
| std | 1.03 |
| value_loss | 20 |
------------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 16 |
| time_elapsed | 511 |
| total_timesteps | 32768 |
| train/ | |
| approx_kl | 0.018864237 |
| clip_fraction | 0.19 |
| clip_range | 0.2 |
| entropy_loss | -42 |
| explained_variance | -0.0334 |
| learning_rate | 0.00025 |
| loss | 40.5 |
| n_updates | 150 |
| policy_gradient_loss | -0.0158 |
| reward | 2.1892703 |
| std | 1.03 |
| value_loss | 80.4 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 17 |
| time_elapsed | 542 |
| total_timesteps | 34816 |
| train/ | |
| approx_kl | 0.025924759 |
| clip_fraction | 0.183 |
| clip_range | 0.2 |
| entropy_loss | -42 |
| explained_variance | -0.0494 |
| learning_rate | 0.00025 |
| loss | 8.64 |
| n_updates | 160 |
| policy_gradient_loss | -0.0154 |
| reward | -1.6194284 |
| std | 1.03 |
| value_loss | 19.1 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 18 |
| time_elapsed | 576 |
| total_timesteps | 36864 |
| train/ | |
| approx_kl | 0.023486339 |
| clip_fraction | 0.227 |
| clip_range | 0.2 |
| entropy_loss | -42 |
| explained_variance | -0.00164 |
| learning_rate | 0.00025 |
| loss | 71 |
| n_updates | 170 |
| policy_gradient_loss | -0.0128 |
| reward | -6.5787015 |
| std | 1.03 |
| value_loss | 175 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 19 |
| time_elapsed | 609 |
| total_timesteps | 38912 |
| train/ | |
| approx_kl | 0.047546946 |
| clip_fraction | 0.278 |
| clip_range | 0.2 |
| entropy_loss | -42 |
| explained_variance | 0.0083 |
| learning_rate | 0.00025 |
| loss | 22.2 |
| n_updates | 180 |
| policy_gradient_loss | -0.00743 |
| reward | 3.6853487 |
| std | 1.03 |
| value_loss | 88.3 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 20 |
| time_elapsed | 643 |
| total_timesteps | 40960 |
| train/ | |
| approx_kl | 0.028585846 |
| clip_fraction | 0.238 |
| clip_range | 0.2 |
| entropy_loss | -42.1 |
| explained_variance | -0.018 |
| learning_rate | 0.00025 |
| loss | 12.6 |
| n_updates | 190 |
| policy_gradient_loss | -0.0166 |
| reward | 2.84366 |
| std | 1.03 |
| value_loss | 35.7 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 21 |
| time_elapsed | 672 |
| total_timesteps | 43008 |
| train/ | |
| approx_kl | 0.021615773 |
| clip_fraction | 0.283 |
| clip_range | 0.2 |
| entropy_loss | -42.1 |
| explained_variance | 0.0164 |
| learning_rate | 0.00025 |
| loss | 39.1 |
| n_updates | 200 |
| policy_gradient_loss | -0.0119 |
| reward | 7.260352 |
| std | 1.04 |
| value_loss | 85.5 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 22 |
| time_elapsed | 703 |
| total_timesteps | 45056 |
| train/ | |
| approx_kl | 0.023984132 |
| clip_fraction | 0.174 |
| clip_range | 0.2 |
| entropy_loss | -42.2 |
| explained_variance | -0.0214 |
| learning_rate | 0.00025 |
| loss | 10.8 |
| n_updates | 210 |
| policy_gradient_loss | -0.015 |
| reward | 0.7453349 |
| std | 1.04 |
| value_loss | 27.4 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 23 |
| time_elapsed | 736 |
| total_timesteps | 47104 |
| train/ | |
| approx_kl | 0.026311198 |
| clip_fraction | 0.239 |
| clip_range | 0.2 |
| entropy_loss | -42.2 |
| explained_variance | 0.0117 |
| learning_rate | 0.00025 |
| loss | 53.5 |
| n_updates | 220 |
| policy_gradient_loss | -0.0147 |
| reward | -3.601917 |
| std | 1.04 |
| value_loss | 109 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 24 |
| time_elapsed | 765 |
| total_timesteps | 49152 |
| train/ | |
| approx_kl | 0.021329464 |
| clip_fraction | 0.228 |
| clip_range | 0.2 |
| entropy_loss | -42.2 |
| explained_variance | 0.0287 |
| learning_rate | 0.00025 |
| loss | 35.5 |
| n_updates | 230 |
| policy_gradient_loss | -0.0174 |
| reward | -1.4932549 |
| std | 1.04 |
| value_loss | 69.7 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 25 |
| time_elapsed | 799 |
| total_timesteps | 51200 |
| train/ | |
| approx_kl | 0.033834375 |
| clip_fraction | 0.347 |
| clip_range | 0.2 |
| entropy_loss | -42.2 |
| explained_variance | -0.0439 |
| learning_rate | 0.00025 |
| loss | 11.2 |
| n_updates | 240 |
| policy_gradient_loss | -0.0175 |
| reward | -0.13293022 |
| std | 1.04 |
| value_loss | 31.1 |
-----------------------------------------
{'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'}
Using cpu device
Logging to results/sac
-----------------------------------
| time/ | |
| episodes | 4 |
| fps | 19 |
| time_elapsed | 693 |
| total_timesteps | 13500 |
| train/ | |
| actor_loss | 1.23e+03 |
| critic_loss | 941 |
| ent_coef | 0.175 |
| ent_coef_loss | -80.9 |
| learning_rate | 0.0001 |
| n_updates | 13399 |
| reward | -4.1185117 |
-----------------------------------
-----------------------------------
| time/ | |
| episodes | 8 |
| fps | 19 |
| time_elapsed | 1407 |
| total_timesteps | 27000 |
| train/ | |
| actor_loss | 486 |
| critic_loss | 378 |
| ent_coef | 0.047 |
| ent_coef_loss | -97.4 |
| learning_rate | 0.0001 |
| n_updates | 26899 |
| reward | -6.0287046 |
-----------------------------------
day: 3374, episode: 10
begin_total_asset: 1039580.61
end_total_asset: 4449383.64
total_reward: 3409803.03
total_cost: 3171.06
total_trades: 48397
Sharpe: 0.687
=================================
-----------------------------------
| time/ | |
| episodes | 12 |
| fps | 19 |
| time_elapsed | 2123 |
| total_timesteps | 40500 |
| train/ | |
| actor_loss | 201 |
| critic_loss | 10.8 |
| ent_coef | 0.0131 |
| ent_coef_loss | -63.2 |
| learning_rate | 0.0001 |
| n_updates | 40399 |
| reward | -5.6925883 |
-----------------------------------
{'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001}
Using cpu device
Logging to results/td3
-----------------------------------
| time/ | |
| episodes | 4 |
| fps | 24 |
| time_elapsed | 545 |
| total_timesteps | 13500 |
| train/ | |
| actor_loss | 16.7 |
| critic_loss | 341 |
| learning_rate | 0.001 |
| n_updates | 10125 |
| reward | -5.7216434 |
-----------------------------------
-----------------------------------
| time/ | |
| episodes | 8 |
| fps | 21 |
| time_elapsed | 1228 |
| total_timesteps | 27000 |
| train/ | |
| actor_loss | 18.7 |
| critic_loss | 21.4 |
| learning_rate | 0.001 |
| n_updates | 23625 |
| reward | -5.7216434 |
-----------------------------------
day: 3374, episode: 10
begin_total_asset: 1043903.24
end_total_asset: 5291054.90
total_reward: 4247151.66
total_cost: 1042.86
total_trades: 64106
Sharpe: 0.723
=================================
-----------------------------------
| time/ | |
| episodes | 12 |
| fps | 21 |
| time_elapsed | 1923 |
| total_timesteps | 40500 |
| train/ | |
| actor_loss | 20.6 |
| critic_loss | 15.9 |
| learning_rate | 0.001 |
| n_updates | 37125 |
| reward | -5.7216434 |
-----------------------------------
hit end!
hit end!
hit end!
hit end!
hit end!
[*********************100%***********************] 1 of 1 completed
Shape of DataFrame: (22, 8)
i: 2
{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0007}
Using cpu device
Logging to results/a2c
---------------------------------------
| time/ | |
| fps | 55 |
| iterations | 100 |
| time_elapsed | 9 |
| total_timesteps | 500 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 0.0552 |
| learning_rate | 0.0007 |
| n_updates | 99 |
| policy_loss | -125 |
| reward | -0.19224237 |
| std | 0.997 |
| value_loss | 10.9 |
---------------------------------------
------------------------------------
| time/ | |
| fps | 65 |
| iterations | 200 |
| time_elapsed | 15 |
| total_timesteps | 1000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 199 |
| policy_loss | -65.7 |
| reward | 2.47076 |
| std | 0.998 |
| value_loss | 3.14 |
------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 300 |
| time_elapsed | 24 |
| total_timesteps | 1500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 299 |
| policy_loss | 221 |
| reward | -0.668967 |
| std | 0.999 |
| value_loss | 38 |
-------------------------------------
------------------------------------
| time/ | |
| fps | 61 |
| iterations | 400 |
| time_elapsed | 32 |
| total_timesteps | 2000 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 399 |
| policy_loss | 2.8 |
| reward | 2.104001 |
| std | 0.997 |
| value_loss | 2.71 |
------------------------------------
-------------------------------------
| time/ | |
| fps | 64 |
| iterations | 500 |
| time_elapsed | 39 |
| total_timesteps | 2500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 499 |
| policy_loss | 239 |
| reward | 3.0126274 |
| std | 0.999 |
| value_loss | 39.6 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 600 |
| time_elapsed | 48 |
| total_timesteps | 3000 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 599 |
| policy_loss | -524 |
| reward | 3.9946847 |
| std | 1 |
| value_loss | 293 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 62 |
| iterations | 700 |
| time_elapsed | 56 |
| total_timesteps | 3500 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | 0.108 |
| learning_rate | 0.0007 |
| n_updates | 699 |
| policy_loss | -37.1 |
| reward | 1.4987615 |
| std | 1 |
| value_loss | 1.8 |
-------------------------------------
------------------------------------
| time/ | |
| fps | 63 |
| iterations | 800 |
| time_elapsed | 62 |
| total_timesteps | 4000 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 799 |
| policy_loss | -337 |
| reward | 2.046587 |
| std | 1 |
| value_loss | 77.1 |
------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 900 |
| time_elapsed | 72 |
| total_timesteps | 4500 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 899 |
| policy_loss | 26.2 |
| reward | 1.0195923 |
| std | 1 |
| value_loss | 4.92 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 62 |
| iterations | 1000 |
| time_elapsed | 79 |
| total_timesteps | 5000 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 999 |
| policy_loss | 80.3 |
| reward | -3.5495179 |
| std | 1 |
| value_loss | 9.56 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 63 |
| iterations | 1100 |
| time_elapsed | 86 |
| total_timesteps | 5500 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | -2.38e-07 |
| learning_rate | 0.0007 |
| n_updates | 1099 |
| policy_loss | -258 |
| reward | 1.6695346 |
| std | 1 |
| value_loss | 44.9 |
-------------------------------------
------------------------------------
| time/ | |
| fps | 62 |
| iterations | 1200 |
| time_elapsed | 96 |
| total_timesteps | 6000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | -0.00397 |
| learning_rate | 0.0007 |
| n_updates | 1199 |
| policy_loss | 185 |
| reward | 2.245284 |
| std | 1 |
| value_loss | 26.3 |
------------------------------------
-------------------------------------
| time/ | |
| fps | 60 |
| iterations | 1300 |
| time_elapsed | 106 |
| total_timesteps | 6500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 1299 |
| policy_loss | 150 |
| reward | 1.491629 |
| std | 0.998 |
| value_loss | 21.5 |
-------------------------------------
---------------------------------------
| time/ | |
| fps | 59 |
| iterations | 1400 |
| time_elapsed | 117 |
| total_timesteps | 7000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 1399 |
| policy_loss | -107 |
| reward | -0.81370664 |
| std | 0.998 |
| value_loss | 8.34 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 60 |
| iterations | 1500 |
| time_elapsed | 124 |
| total_timesteps | 7500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | -0.0459 |
| learning_rate | 0.0007 |
| n_updates | 1499 |
| policy_loss | -10 |
| reward | 2.3688922 |
| std | 0.997 |
| value_loss | 0.922 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 60 |
| iterations | 1600 |
| time_elapsed | 131 |
| total_timesteps | 8000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 1599 |
| policy_loss | 128 |
| reward | 0.56861943 |
| std | 0.998 |
| value_loss | 19.5 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 1700 |
| time_elapsed | 141 |
| total_timesteps | 8500 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 0.203 |
| learning_rate | 0.0007 |
| n_updates | 1699 |
| policy_loss | 37 |
| reward | 1.2727017 |
| std | 0.996 |
| value_loss | 3.2 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 60 |
| iterations | 1800 |
| time_elapsed | 148 |
| total_timesteps | 9000 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 1799 |
| policy_loss | 80.9 |
| reward | 3.9352329 |
| std | 0.996 |
| value_loss | 9.56 |
-------------------------------------
------------------------------------
| time/ | |
| fps | 60 |
| iterations | 1900 |
| time_elapsed | 156 |
| total_timesteps | 9500 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 1899 |
| policy_loss | 507 |
| reward | 6.328624 |
| std | 0.997 |
| value_loss | 187 |
------------------------------------
--------------------------------------
| time/ | |
| fps | 60 |
| iterations | 2000 |
| time_elapsed | 166 |
| total_timesteps | 10000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 1999 |
| policy_loss | -126 |
| reward | -2.8187668 |
| std | 1 |
| value_loss | 8.37 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 60 |
| iterations | 2100 |
| time_elapsed | 172 |
| total_timesteps | 10500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 2099 |
| policy_loss | -58.5 |
| reward | 0.2734109 |
| std | 0.999 |
| value_loss | 3 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 60 |
| iterations | 2200 |
| time_elapsed | 180 |
| total_timesteps | 11000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 2199 |
| policy_loss | 157 |
| reward | 0.68144214 |
| std | 0.997 |
| value_loss | 19.9 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 60 |
| iterations | 2300 |
| time_elapsed | 190 |
| total_timesteps | 11500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 2299 |
| policy_loss | -67.3 |
| reward | -1.8721669 |
| std | 0.999 |
| value_loss | 2.49 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 2400 |
| time_elapsed | 196 |
| total_timesteps | 12000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | -0.0105 |
| learning_rate | 0.0007 |
| n_updates | 2399 |
| policy_loss | 144 |
| reward | 0.47134838 |
| std | 0.997 |
| value_loss | 26.3 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 60 |
| iterations | 2500 |
| time_elapsed | 204 |
| total_timesteps | 12500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 2499 |
| policy_loss | 589 |
| reward | -1.9081986 |
| std | 0.997 |
| value_loss | 221 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 60 |
| iterations | 2600 |
| time_elapsed | 213 |
| total_timesteps | 13000 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 2599 |
| policy_loss | 352 |
| reward | 6.1386447 |
| std | 0.996 |
| value_loss | 148 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 2700 |
| time_elapsed | 219 |
| total_timesteps | 13500 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | -0.0134 |
| learning_rate | 0.0007 |
| n_updates | 2699 |
| policy_loss | -131 |
| reward | 0.6143146 |
| std | 0.995 |
| value_loss | 12.3 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 2800 |
| time_elapsed | 229 |
| total_timesteps | 14000 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 2799 |
| policy_loss | 132 |
| reward | 1.7656372 |
| std | 0.995 |
| value_loss | 13.2 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 2900 |
| time_elapsed | 237 |
| total_timesteps | 14500 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | -0.0245 |
| learning_rate | 0.0007 |
| n_updates | 2899 |
| policy_loss | 17.1 |
| reward | 0.08867768 |
| std | 0.995 |
| value_loss | 3.2 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 3000 |
| time_elapsed | 243 |
| total_timesteps | 15000 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 2999 |
| policy_loss | 48.4 |
| reward | -3.6771903 |
| std | 0.995 |
| value_loss | 2.24 |
--------------------------------------
------------------------------------
| time/ | |
| fps | 61 |
| iterations | 3100 |
| time_elapsed | 253 |
| total_timesteps | 15500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 3099 |
| policy_loss | -0.565 |
| reward | 0.679106 |
| std | 0.998 |
| value_loss | 1.98 |
------------------------------------
----------------------------------------
| time/ | |
| fps | 61 |
| iterations | 3200 |
| time_elapsed | 260 |
| total_timesteps | 16000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3199 |
| policy_loss | 26.6 |
| reward | 0.0013427841 |
| std | 0.998 |
| value_loss | 11.3 |
----------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 3300 |
| time_elapsed | 267 |
| total_timesteps | 16500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3299 |
| policy_loss | 54.6 |
| reward | 1.6012005 |
| std | 0.998 |
| value_loss | 7.47 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 3400 |
| time_elapsed | 277 |
| total_timesteps | 17000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | -0.0366 |
| learning_rate | 0.0007 |
| n_updates | 3399 |
| policy_loss | -33.7 |
| reward | 0.8685799 |
| std | 1 |
| value_loss | 3.79 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 3500 |
| time_elapsed | 284 |
| total_timesteps | 17500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3499 |
| policy_loss | 21 |
| reward | 0.14613488 |
| std | 1 |
| value_loss | 1.32 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 3600 |
| time_elapsed | 291 |
| total_timesteps | 18000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 3599 |
| policy_loss | -173 |
| reward | -1.2669375 |
| std | 1 |
| value_loss | 20.9 |
--------------------------------------
-----------------------------------------
| time/ | |
| fps | 61 |
| iterations | 3700 |
| time_elapsed | 301 |
| total_timesteps | 18500 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3699 |
| policy_loss | -53.5 |
| reward | -0.0045535835 |
| std | 1 |
| value_loss | 8.58 |
-----------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 3800 |
| time_elapsed | 308 |
| total_timesteps | 19000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3799 |
| policy_loss | 98.2 |
| reward | 0.6932831 |
| std | 1 |
| value_loss | 7.95 |
-------------------------------------
------------------------------------
| time/ | |
| fps | 61 |
| iterations | 3900 |
| time_elapsed | 316 |
| total_timesteps | 19500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3899 |
| policy_loss | 628 |
| reward | 8.699067 |
| std | 1 |
| value_loss | 335 |
------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 4000 |
| time_elapsed | 325 |
| total_timesteps | 20000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3999 |
| policy_loss | 605 |
| reward | 17.186195 |
| std | 1 |
| value_loss | 266 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 4100 |
| time_elapsed | 331 |
| total_timesteps | 20500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0.0757 |
| learning_rate | 0.0007 |
| n_updates | 4099 |
| policy_loss | 187 |
| reward | 1.1687305 |
| std | 1 |
| value_loss | 22.2 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 4200 |
| time_elapsed | 341 |
| total_timesteps | 21000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 4199 |
| policy_loss | -7.42 |
| reward | -0.5722446 |
| std | 1 |
| value_loss | 0.248 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 61 |
| iterations | 4300 |
| time_elapsed | 351 |
| total_timesteps | 21500 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 4299 |
| policy_loss | 44.7 |
| reward | -0.39718863 |
| std | 1 |
| value_loss | 2.09 |
---------------------------------------
---------------------------------------
| time/ | |
| fps | 61 |
| iterations | 4400 |
| time_elapsed | 359 |
| total_timesteps | 22000 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 4399 |
| policy_loss | -29.9 |
| reward | -0.07916131 |
| std | 1 |
| value_loss | 0.947 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 60 |
| iterations | 4500 |
| time_elapsed | 369 |
| total_timesteps | 22500 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 4499 |
| policy_loss | -28.5 |
| reward | 1.8631558 |
| std | 1 |
| value_loss | 1.83 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 4600 |
| time_elapsed | 375 |
| total_timesteps | 23000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 4599 |
| policy_loss | 58.4 |
| reward | 1.4081724 |
| std | 0.999 |
| value_loss | 77 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 4700 |
| time_elapsed | 383 |
| total_timesteps | 23500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 4699 |
| policy_loss | 79.9 |
| reward | -2.0728068 |
| std | 0.998 |
| value_loss | 4.66 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 61 |
| iterations | 4800 |
| time_elapsed | 392 |
| total_timesteps | 24000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0.0901 |
| learning_rate | 0.0007 |
| n_updates | 4799 |
| policy_loss | -15.9 |
| reward | -0.09403604 |
| std | 0.998 |
| value_loss | 0.165 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 4900 |
| time_elapsed | 399 |
| total_timesteps | 24500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 4899 |
| policy_loss | -110 |
| reward | 2.0542228 |
| std | 1 |
| value_loss | 10.8 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 5000 |
| time_elapsed | 407 |
| total_timesteps | 25000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 4999 |
| policy_loss | -186 |
| reward | 2.1355224 |
| std | 0.999 |
| value_loss | 27.6 |
-------------------------------------
---------------------------------------
| time/ | |
| fps | 61 |
| iterations | 5100 |
| time_elapsed | 416 |
| total_timesteps | 25500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 5099 |
| policy_loss | -13.1 |
| reward | -0.08651471 |
| std | 0.999 |
| value_loss | 1.59 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 5200 |
| time_elapsed | 423 |
| total_timesteps | 26000 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 5199 |
| policy_loss | 99 |
| reward | 2.3819537 |
| std | 0.997 |
| value_loss | 29.5 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 5300 |
| time_elapsed | 432 |
| total_timesteps | 26500 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 5299 |
| policy_loss | 372 |
| reward | -22.664398 |
| std | 0.997 |
| value_loss | 196 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 5400 |
| time_elapsed | 440 |
| total_timesteps | 27000 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 5399 |
| policy_loss | 153 |
| reward | 1.1176498 |
| std | 0.995 |
| value_loss | 15.5 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 5500 |
| time_elapsed | 446 |
| total_timesteps | 27500 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 5499 |
| policy_loss | -92.6 |
| reward | -5.1304746 |
| std | 0.996 |
| value_loss | 14.5 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 5600 |
| time_elapsed | 456 |
| total_timesteps | 28000 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 0.0186 |
| learning_rate | 0.0007 |
| n_updates | 5599 |
| policy_loss | 62.9 |
| reward | 1.1683302 |
| std | 0.996 |
| value_loss | 9.46 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 5700 |
| time_elapsed | 464 |
| total_timesteps | 28500 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 5699 |
| policy_loss | 34.4 |
| reward | -3.4618378 |
| std | 0.995 |
| value_loss | 12 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 5800 |
| time_elapsed | 471 |
| total_timesteps | 29000 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | -0.247 |
| learning_rate | 0.0007 |
| n_updates | 5799 |
| policy_loss | -171 |
| reward | 5.8363895 |
| std | 0.996 |
| value_loss | 20.7 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 5900 |
| time_elapsed | 481 |
| total_timesteps | 29500 |
| train/ | |
| entropy_loss | -41 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 5899 |
| policy_loss | 250 |
| reward | -4.7651134 |
| std | 0.996 |
| value_loss | 83.8 |
--------------------------------------
------------------------------------
| time/ | |
| fps | 61 |
| iterations | 6000 |
| time_elapsed | 487 |
| total_timesteps | 30000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 5999 |
| policy_loss | -211 |
| reward | 3.167639 |
| std | 0.999 |
| value_loss | 95.9 |
------------------------------------
day: 3352, episode: 10
begin_total_asset: 1005653.74
end_total_asset: 8785262.38
total_reward: 7779608.65
total_cost: 31168.83
total_trades: 45054
Sharpe: 0.834
=================================
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 6100 |
| time_elapsed | 495 |
| total_timesteps | 30500 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | -0.113 |
| learning_rate | 0.0007 |
| n_updates | 6099 |
| policy_loss | 21.6 |
| reward | 0.12259659 |
| std | 1 |
| value_loss | 0.918 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 6200 |
| time_elapsed | 505 |
| total_timesteps | 31000 |
| train/ | |
| entropy_loss | -41.1 |
| explained_variance | -0.00301 |
| learning_rate | 0.0007 |
| n_updates | 6199 |
| policy_loss | -126 |
| reward | 1.5182142 |
| std | 1 |
| value_loss | 16.8 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 6300 |
| time_elapsed | 511 |
| total_timesteps | 31500 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | -1.74 |
| learning_rate | 0.0007 |
| n_updates | 6299 |
| policy_loss | -25.1 |
| reward | 0.5284405 |
| std | 1 |
| value_loss | 1.56 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 6400 |
| time_elapsed | 519 |
| total_timesteps | 32000 |
| train/ | |
| entropy_loss | -41.3 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 6399 |
| policy_loss | 123 |
| reward | -3.4704874 |
| std | 1.01 |
| value_loss | 14.2 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 6500 |
| time_elapsed | 528 |
| total_timesteps | 32500 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 6499 |
| policy_loss | -102 |
| reward | 3.8022645 |
| std | 1.01 |
| value_loss | 11 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 6600 |
| time_elapsed | 535 |
| total_timesteps | 33000 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 6599 |
| policy_loss | 372 |
| reward | 24.101572 |
| std | 1.01 |
| value_loss | 136 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 6700 |
| time_elapsed | 543 |
| total_timesteps | 33500 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 6699 |
| policy_loss | -923 |
| reward | -11.455252 |
| std | 1.01 |
| value_loss | 474 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 6800 |
| time_elapsed | 552 |
| total_timesteps | 34000 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | -0.553 |
| learning_rate | 0.0007 |
| n_updates | 6799 |
| policy_loss | -111 |
| reward | 0.10300914 |
| std | 1.01 |
| value_loss | 9.3 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 61 |
| iterations | 6900 |
| time_elapsed | 558 |
| total_timesteps | 34500 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 6899 |
| policy_loss | 133 |
| reward | -0.71322364 |
| std | 1.01 |
| value_loss | 15.3 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 7000 |
| time_elapsed | 567 |
| total_timesteps | 35000 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 6999 |
| policy_loss | -84.3 |
| reward | 2.3256698 |
| std | 1.01 |
| value_loss | 4.58 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 7100 |
| time_elapsed | 575 |
| total_timesteps | 35500 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 7099 |
| policy_loss | -2.44 |
| reward | 0.9263134 |
| std | 1.01 |
| value_loss | 1.21 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 7200 |
| time_elapsed | 586 |
| total_timesteps | 36000 |
| train/ | |
| entropy_loss | -41.5 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 7199 |
| policy_loss | -228 |
| reward | -1.9283981 |
| std | 1.01 |
| value_loss | 42 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 7300 |
| time_elapsed | 596 |
| total_timesteps | 36500 |
| train/ | |
| entropy_loss | -41.5 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 7299 |
| policy_loss | 81 |
| reward | -6.168546 |
| std | 1.01 |
| value_loss | 8.16 |
-------------------------------------
---------------------------------------
| time/ | |
| fps | 61 |
| iterations | 7400 |
| time_elapsed | 602 |
| total_timesteps | 37000 |
| train/ | |
| entropy_loss | -41.5 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 7399 |
| policy_loss | -136 |
| reward | -0.66517484 |
| std | 1.02 |
| value_loss | 12.4 |
---------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 7500 |
| time_elapsed | 610 |
| total_timesteps | 37500 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | 1.44e-05 |
| learning_rate | 0.0007 |
| n_updates | 7499 |
| policy_loss | -368 |
| reward | 0.28679553 |
| std | 1.02 |
| value_loss | 81.7 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 61 |
| iterations | 7600 |
| time_elapsed | 620 |
| total_timesteps | 38000 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 7599 |
| policy_loss | -80.4 |
| reward | -0.02342434 |
| std | 1.02 |
| value_loss | 4.71 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 7700 |
| time_elapsed | 626 |
| total_timesteps | 38500 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | -3.11 |
| learning_rate | 0.0007 |
| n_updates | 7699 |
| policy_loss | -91.7 |
| reward | 2.6142132 |
| std | 1.02 |
| value_loss | 6.61 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 7800 |
| time_elapsed | 635 |
| total_timesteps | 39000 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | 0.125 |
| learning_rate | 0.0007 |
| n_updates | 7799 |
| policy_loss | 4.76 |
| reward | -1.2840562 |
| std | 1.02 |
| value_loss | 0.762 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 61 |
| iterations | 7900 |
| time_elapsed | 644 |
| total_timesteps | 39500 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0.0476 |
| learning_rate | 0.0007 |
| n_updates | 7899 |
| policy_loss | 273 |
| reward | -0.55217224 |
| std | 1.02 |
| value_loss | 46.2 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 8000 |
| time_elapsed | 650 |
| total_timesteps | 40000 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 7999 |
| policy_loss | -769 |
| reward | 1.6622137 |
| std | 1.02 |
| value_loss | 367 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 8100 |
| time_elapsed | 660 |
| total_timesteps | 40500 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | -0.131 |
| learning_rate | 0.0007 |
| n_updates | 8099 |
| policy_loss | 38.2 |
| reward | 0.38162667 |
| std | 1.02 |
| value_loss | 1.09 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 8200 |
| time_elapsed | 668 |
| total_timesteps | 41000 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | -0.306 |
| learning_rate | 0.0007 |
| n_updates | 8199 |
| policy_loss | 40.8 |
| reward | 0.8386523 |
| std | 1.02 |
| value_loss | 3.26 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 8300 |
| time_elapsed | 674 |
| total_timesteps | 41500 |
| train/ | |
| entropy_loss | -41.5 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 8299 |
| policy_loss | 41.7 |
| reward | 1.5822707 |
| std | 1.02 |
| value_loss | 4.75 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 8400 |
| time_elapsed | 684 |
| total_timesteps | 42000 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 8399 |
| policy_loss | 133 |
| reward | 0.1792632 |
| std | 1.02 |
| value_loss | 15.6 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 8500 |
| time_elapsed | 691 |
| total_timesteps | 42500 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 8499 |
| policy_loss | 99.2 |
| reward | 1.6896911 |
| std | 1.02 |
| value_loss | 33.2 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 8600 |
| time_elapsed | 699 |
| total_timesteps | 43000 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | -0.0163 |
| learning_rate | 0.0007 |
| n_updates | 8599 |
| policy_loss | -836 |
| reward | 30.580954 |
| std | 1.02 |
| value_loss | 436 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 8700 |
| time_elapsed | 709 |
| total_timesteps | 43500 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0.0669 |
| learning_rate | 0.0007 |
| n_updates | 8699 |
| policy_loss | -430 |
| reward | -9.169519 |
| std | 1.02 |
| value_loss | 186 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 8800 |
| time_elapsed | 715 |
| total_timesteps | 44000 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0.178 |
| learning_rate | 0.0007 |
| n_updates | 8799 |
| policy_loss | -2.39 |
| reward | -0.505542 |
| std | 1.02 |
| value_loss | 0.0762 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 8900 |
| time_elapsed | 723 |
| total_timesteps | 44500 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0.0519 |
| learning_rate | 0.0007 |
| n_updates | 8899 |
| policy_loss | -29 |
| reward | 1.4009765 |
| std | 1.02 |
| value_loss | 0.617 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 9000 |
| time_elapsed | 733 |
| total_timesteps | 45000 |
| train/ | |
| entropy_loss | -41.8 |
| explained_variance | 0.0464 |
| learning_rate | 0.0007 |
| n_updates | 8999 |
| policy_loss | -142 |
| reward | -1.6482956 |
| std | 1.02 |
| value_loss | 12.1 |
--------------------------------------
----------------------------------------
| time/ | |
| fps | 61 |
| iterations | 9100 |
| time_elapsed | 739 |
| total_timesteps | 45500 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0.128 |
| learning_rate | 0.0007 |
| n_updates | 9099 |
| policy_loss | -18.8 |
| reward | -0.022230674 |
| std | 1.02 |
| value_loss | 1.08 |
----------------------------------------
---------------------------------------
| time/ | |
| fps | 61 |
| iterations | 9200 |
| time_elapsed | 748 |
| total_timesteps | 46000 |
| train/ | |
| entropy_loss | -41.8 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 9199 |
| policy_loss | 35.8 |
| reward | -0.16132466 |
| std | 1.02 |
| value_loss | 6.5 |
---------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 9300 |
| time_elapsed | 757 |
| total_timesteps | 46500 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0.00416 |
| learning_rate | 0.0007 |
| n_updates | 9299 |
| policy_loss | -192 |
| reward | -3.3674068 |
| std | 1.02 |
| value_loss | 87.7 |
--------------------------------------
------------------------------------
| time/ | |
| fps | 61 |
| iterations | 9400 |
| time_elapsed | 763 |
| total_timesteps | 47000 |
| train/ | |
| entropy_loss | -41.8 |
| explained_variance | -0.0393 |
| learning_rate | 0.0007 |
| n_updates | 9399 |
| policy_loss | -37.6 |
| reward | 1.150722 |
| std | 1.02 |
| value_loss | 3.37 |
------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 9500 |
| time_elapsed | 772 |
| total_timesteps | 47500 |
| train/ | |
| entropy_loss | -41.8 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 9499 |
| policy_loss | -25.2 |
| reward | 0.41208658 |
| std | 1.02 |
| value_loss | 0.698 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 9600 |
| time_elapsed | 780 |
| total_timesteps | 48000 |
| train/ | |
| entropy_loss | -41.9 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 9599 |
| policy_loss | 9.9 |
| reward | 0.5765088 |
| std | 1.03 |
| value_loss | 2.06 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 9700 |
| time_elapsed | 787 |
| total_timesteps | 48500 |
| train/ | |
| entropy_loss | -41.9 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 9699 |
| policy_loss | 315 |
| reward | -0.2841707 |
| std | 1.03 |
| value_loss | 49.4 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 9800 |
| time_elapsed | 797 |
| total_timesteps | 49000 |
| train/ | |
| entropy_loss | -41.9 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 9799 |
| policy_loss | 125 |
| reward | 0.6355639 |
| std | 1.03 |
| value_loss | 9.9 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 61 |
| iterations | 9900 |
| time_elapsed | 804 |
| total_timesteps | 49500 |
| train/ | |
| entropy_loss | -41.9 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 9899 |
| policy_loss | 155 |
| reward | -4.6037025 |
| std | 1.03 |
| value_loss | 16.2 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 61 |
| iterations | 10000 |
| time_elapsed | 811 |
| total_timesteps | 50000 |
| train/ | |
| entropy_loss | -41.9 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 9999 |
| policy_loss | 104 |
| reward | -3.306132 |
| std | 1.03 |
| value_loss | 9.5 |
-------------------------------------
{'batch_size': 128, 'buffer_size': 50000, 'learning_rate': 0.001}
Using cpu device
Logging to results/ddpg
-----------------------------------
| time/ | |
| episodes | 4 |
| fps | 24 |
| time_elapsed | 547 |
| total_timesteps | 13412 |
| train/ | |
| actor_loss | 12.5 |
| critic_loss | 295 |
| learning_rate | 0.001 |
| n_updates | 10059 |
| reward | -6.3763723 |
-----------------------------------
-----------------------------------
| time/ | |
| episodes | 8 |
| fps | 21 |
| time_elapsed | 1237 |
| total_timesteps | 26824 |
| train/ | |
| actor_loss | -5.51 |
| critic_loss | 14.7 |
| learning_rate | 0.001 |
| n_updates | 23471 |
| reward | -6.3763723 |
-----------------------------------
day: 3352, episode: 10
begin_total_asset: 1011382.29
end_total_asset: 6058882.63
total_reward: 5047500.34
total_cost: 1010.37
total_trades: 46928
Sharpe: 0.807
=================================
-----------------------------------
| time/ | |
| episodes | 12 |
| fps | 20 |
| time_elapsed | 1942 |
| total_timesteps | 40236 |
| train/ | |
| actor_loss | -8.67 |
| critic_loss | 7.57 |
| learning_rate | 0.001 |
| n_updates | 36883 |
| reward | -6.3763723 |
-----------------------------------
{'n_steps': 2048, 'ent_coef': 0.01, 'learning_rate': 0.00025, 'batch_size': 128}
Using cpu device
Logging to results/ppo
------------------------------------
| time/ | |
| fps | 60 |
| iterations | 1 |
| time_elapsed | 33 |
| total_timesteps | 2048 |
| train/ | |
| reward | -0.20400214 |
------------------------------------
-----------------------------------------
| time/ | |
| fps | 61 |
| iterations | 2 |
| time_elapsed | 66 |
| total_timesteps | 4096 |
| train/ | |
| approx_kl | 0.019446293 |
| clip_fraction | 0.218 |
| clip_range | 0.2 |
| entropy_loss | -41.2 |
| explained_variance | -0.0153 |
| learning_rate | 0.00025 |
| loss | 8.32 |
| n_updates | 10 |
| policy_gradient_loss | -0.0239 |
| reward | 0.964798 |
| std | 1 |
| value_loss | 12.1 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 60 |
| iterations | 3 |
| time_elapsed | 100 |
| total_timesteps | 6144 |
| train/ | |
| approx_kl | 0.017402954 |
| clip_fraction | 0.182 |
| clip_range | 0.2 |
| entropy_loss | -41.2 |
| explained_variance | 0.000782 |
| learning_rate | 0.00025 |
| loss | 28.3 |
| n_updates | 20 |
| policy_gradient_loss | -0.0164 |
| reward | 7.5384216 |
| std | 1 |
| value_loss | 51.3 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 4 |
| time_elapsed | 131 |
| total_timesteps | 8192 |
| train/ | |
| approx_kl | 0.015737543 |
| clip_fraction | 0.162 |
| clip_range | 0.2 |
| entropy_loss | -41.3 |
| explained_variance | -0.0037 |
| learning_rate | 0.00025 |
| loss | 22.7 |
| n_updates | 30 |
| policy_gradient_loss | -0.0225 |
| reward | 2.27421 |
| std | 1.01 |
| value_loss | 38.2 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 5 |
| time_elapsed | 165 |
| total_timesteps | 10240 |
| train/ | |
| approx_kl | 0.020310912 |
| clip_fraction | 0.184 |
| clip_range | 0.2 |
| entropy_loss | -41.3 |
| explained_variance | -0.00721 |
| learning_rate | 0.00025 |
| loss | 12 |
| n_updates | 40 |
| policy_gradient_loss | -0.0202 |
| reward | 0.7753585 |
| std | 1.01 |
| value_loss | 21.3 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 6 |
| time_elapsed | 196 |
| total_timesteps | 12288 |
| train/ | |
| approx_kl | 0.014960194 |
| clip_fraction | 0.143 |
| clip_range | 0.2 |
| entropy_loss | -41.4 |
| explained_variance | -0.0299 |
| learning_rate | 0.00025 |
| loss | 30.5 |
| n_updates | 50 |
| policy_gradient_loss | -0.0179 |
| reward | 2.62347 |
| std | 1.01 |
| value_loss | 46.2 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 7 |
| time_elapsed | 228 |
| total_timesteps | 14336 |
| train/ | |
| approx_kl | 0.023127541 |
| clip_fraction | 0.193 |
| clip_range | 0.2 |
| entropy_loss | -41.4 |
| explained_variance | -0.023 |
| learning_rate | 0.00025 |
| loss | 6.36 |
| n_updates | 60 |
| policy_gradient_loss | -0.0221 |
| reward | 1.0379714 |
| std | 1.01 |
| value_loss | 14.1 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 8 |
| time_elapsed | 260 |
| total_timesteps | 16384 |
| train/ | |
| approx_kl | 0.018745095 |
| clip_fraction | 0.201 |
| clip_range | 0.2 |
| entropy_loss | -41.5 |
| explained_variance | 0.00254 |
| learning_rate | 0.00025 |
| loss | 18.3 |
| n_updates | 70 |
| policy_gradient_loss | -0.019 |
| reward | -0.21705139 |
| std | 1.01 |
| value_loss | 59.5 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 9 |
| time_elapsed | 294 |
| total_timesteps | 18432 |
| train/ | |
| approx_kl | 0.018167643 |
| clip_fraction | 0.154 |
| clip_range | 0.2 |
| entropy_loss | -41.5 |
| explained_variance | -0.000664 |
| learning_rate | 0.00025 |
| loss | 21.5 |
| n_updates | 80 |
| policy_gradient_loss | -0.0144 |
| reward | -0.31962025 |
| std | 1.01 |
| value_loss | 36.6 |
-----------------------------------------
----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 10 |
| time_elapsed | 328 |
| total_timesteps | 20480 |
| train/ | |
| approx_kl | 0.02108417 |
| clip_fraction | 0.244 |
| clip_range | 0.2 |
| entropy_loss | -41.6 |
| explained_variance | 0.0203 |
| learning_rate | 0.00025 |
| loss | 7.36 |
| n_updates | 90 |
| policy_gradient_loss | -0.0191 |
| reward | 0.07936729 |
| std | 1.02 |
| value_loss | 23.3 |
----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 11 |
| time_elapsed | 357 |
| total_timesteps | 22528 |
| train/ | |
| approx_kl | 0.014700897 |
| clip_fraction | 0.166 |
| clip_range | 0.2 |
| entropy_loss | -41.6 |
| explained_variance | 0.00383 |
| learning_rate | 0.00025 |
| loss | 29.1 |
| n_updates | 100 |
| policy_gradient_loss | -0.0156 |
| reward | 1.4870173 |
| std | 1.02 |
| value_loss | 93.3 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 12 |
| time_elapsed | 391 |
| total_timesteps | 24576 |
| train/ | |
| approx_kl | 0.017688308 |
| clip_fraction | 0.194 |
| clip_range | 0.2 |
| entropy_loss | -41.7 |
| explained_variance | -0.0104 |
| learning_rate | 0.00025 |
| loss | 6.58 |
| n_updates | 110 |
| policy_gradient_loss | -0.0161 |
| reward | -0.7623598 |
| std | 1.02 |
| value_loss | 17.5 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 13 |
| time_elapsed | 422 |
| total_timesteps | 26624 |
| train/ | |
| approx_kl | 0.023069832 |
| clip_fraction | 0.24 |
| clip_range | 0.2 |
| entropy_loss | -41.7 |
| explained_variance | 0.0101 |
| learning_rate | 0.00025 |
| loss | 38.8 |
| n_updates | 120 |
| policy_gradient_loss | -0.0147 |
| reward | 3.4454083 |
| std | 1.02 |
| value_loss | 64.9 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 14 |
| time_elapsed | 454 |
| total_timesteps | 28672 |
| train/ | |
| approx_kl | 0.017561657 |
| clip_fraction | 0.204 |
| clip_range | 0.2 |
| entropy_loss | -41.7 |
| explained_variance | -0.0172 |
| learning_rate | 0.00025 |
| loss | 21.4 |
| n_updates | 130 |
| policy_gradient_loss | -0.02 |
| reward | 0.9586051 |
| std | 1.02 |
| value_loss | 52.4 |
-----------------------------------------
day: 3352, episode: 10
begin_total_asset: 988584.72
end_total_asset: 3416710.65
total_reward: 2428125.94
total_cost: 420148.99
total_trades: 89136
Sharpe: 0.598
=================================
----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 15 |
| time_elapsed | 487 |
| total_timesteps | 30720 |
| train/ | |
| approx_kl | 0.02006042 |
| clip_fraction | 0.219 |
| clip_range | 0.2 |
| entropy_loss | -41.7 |
| explained_variance | -0.0279 |
| learning_rate | 0.00025 |
| loss | 13.3 |
| n_updates | 140 |
| policy_gradient_loss | -0.0185 |
| reward | -0.3580386 |
| std | 1.02 |
| value_loss | 23.2 |
----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 16 |
| time_elapsed | 523 |
| total_timesteps | 32768 |
| train/ | |
| approx_kl | 0.025233287 |
| clip_fraction | 0.243 |
| clip_range | 0.2 |
| entropy_loss | -41.8 |
| explained_variance | -0.00552 |
| learning_rate | 0.00025 |
| loss | 22.4 |
| n_updates | 150 |
| policy_gradient_loss | -0.0176 |
| reward | -0.5090524 |
| std | 1.02 |
| value_loss | 69.5 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 17 |
| time_elapsed | 556 |
| total_timesteps | 34816 |
| train/ | |
| approx_kl | 0.022021335 |
| clip_fraction | 0.216 |
| clip_range | 0.2 |
| entropy_loss | -41.8 |
| explained_variance | 0.0188 |
| learning_rate | 0.00025 |
| loss | 8.75 |
| n_updates | 160 |
| policy_gradient_loss | -0.0188 |
| reward | 1.8985721 |
| std | 1.03 |
| value_loss | 23.2 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 18 |
| time_elapsed | 586 |
| total_timesteps | 36864 |
| train/ | |
| approx_kl | 0.019396901 |
| clip_fraction | 0.229 |
| clip_range | 0.2 |
| entropy_loss | -41.9 |
| explained_variance | 0.00194 |
| learning_rate | 0.00025 |
| loss | 14.6 |
| n_updates | 170 |
| policy_gradient_loss | -0.0195 |
| reward | -0.31956208 |
| std | 1.03 |
| value_loss | 39.6 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 19 |
| time_elapsed | 622 |
| total_timesteps | 38912 |
| train/ | |
| approx_kl | 0.020318478 |
| clip_fraction | 0.225 |
| clip_range | 0.2 |
| entropy_loss | -41.9 |
| explained_variance | 0.0132 |
| learning_rate | 0.00025 |
| loss | 22.3 |
| n_updates | 180 |
| policy_gradient_loss | -0.0128 |
| reward | 0.33881456 |
| std | 1.03 |
| value_loss | 55.7 |
-----------------------------------------
----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 20 |
| time_elapsed | 652 |
| total_timesteps | 40960 |
| train/ | |
| approx_kl | 0.02080874 |
| clip_fraction | 0.179 |
| clip_range | 0.2 |
| entropy_loss | -42 |
| explained_variance | 0.0334 |
| learning_rate | 0.00025 |
| loss | 5.55 |
| n_updates | 190 |
| policy_gradient_loss | -0.0186 |
| reward | 0.15585361 |
| std | 1.03 |
| value_loss | 19.1 |
----------------------------------------
----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 21 |
| time_elapsed | 686 |
| total_timesteps | 43008 |
| train/ | |
| approx_kl | 0.01973752 |
| clip_fraction | 0.227 |
| clip_range | 0.2 |
| entropy_loss | -42 |
| explained_variance | 0.00997 |
| learning_rate | 0.00025 |
| loss | 19 |
| n_updates | 200 |
| policy_gradient_loss | -0.0153 |
| reward | -14.07267 |
| std | 1.03 |
| value_loss | 75.2 |
----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 22 |
| time_elapsed | 717 |
| total_timesteps | 45056 |
| train/ | |
| approx_kl | 0.013898542 |
| clip_fraction | 0.0931 |
| clip_range | 0.2 |
| entropy_loss | -42 |
| explained_variance | -0.000876 |
| learning_rate | 0.00025 |
| loss | 12.2 |
| n_updates | 210 |
| policy_gradient_loss | -0.0138 |
| reward | -5.085373 |
| std | 1.03 |
| value_loss | 27 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 23 |
| time_elapsed | 750 |
| total_timesteps | 47104 |
| train/ | |
| approx_kl | 0.01667095 |
| clip_fraction | 0.185 |
| clip_range | 0.2 |
| entropy_loss | -42.1 |
| explained_variance | 0.00379 |
| learning_rate | 0.00025 |
| loss | 8.83 |
| n_updates | 220 |
| policy_gradient_loss | -0.0139 |
| reward | -0.11939671 |
| std | 1.03 |
| value_loss | 30.8 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 24 |
| time_elapsed | 785 |
| total_timesteps | 49152 |
| train/ | |
| approx_kl | 0.027711859 |
| clip_fraction | 0.253 |
| clip_range | 0.2 |
| entropy_loss | -42.1 |
| explained_variance | 0.0238 |
| learning_rate | 0.00025 |
| loss | 31.9 |
| n_updates | 230 |
| policy_gradient_loss | -0.00308 |
| reward | -1.080327 |
| std | 1.03 |
| value_loss | 75 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 62 |
| iterations | 25 |
| time_elapsed | 817 |
| total_timesteps | 51200 |
| train/ | |
| approx_kl | 0.025901645 |
| clip_fraction | 0.278 |
| clip_range | 0.2 |
| entropy_loss | -42.1 |
| explained_variance | 0.0481 |
| learning_rate | 0.00025 |
| loss | 5.34 |
| n_updates | 240 |
| policy_gradient_loss | -0.0164 |
| reward | 0.08477563 |
| std | 1.04 |
| value_loss | 13 |
-----------------------------------------
{'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'}
Using cpu device
Logging to results/sac
-----------------------------------
| time/ | |
| episodes | 4 |
| fps | 19 |
| time_elapsed | 703 |
| total_timesteps | 13412 |
| train/ | |
| actor_loss | 1.68e+03 |
| critic_loss | 1e+04 |
| ent_coef | 0.309 |
| ent_coef_loss | -0.516 |
| learning_rate | 0.0001 |
| n_updates | 13311 |
| reward | -11.183781 |
-----------------------------------
-----------------------------------
| time/ | |
| episodes | 8 |
| fps | 19 |
| time_elapsed | 1410 |
| total_timesteps | 26824 |
| train/ | |
| actor_loss | 677 |
| critic_loss | 66.2 |
| ent_coef | 0.0855 |
| ent_coef_loss | -112 |
| learning_rate | 0.0001 |
| n_updates | 26723 |
| reward | -10.753805 |
-----------------------------------
day: 3352, episode: 10
begin_total_asset: 1005927.23
end_total_asset: 5294689.46
total_reward: 4288762.22
total_cost: 37988.65
total_trades: 61507
Sharpe: 0.700
=================================
----------------------------------
| time/ | |
| episodes | 12 |
| fps | 18 |
| time_elapsed | 2126 |
| total_timesteps | 40236 |
| train/ | |
| actor_loss | 304 |
| critic_loss | 20.1 |
| ent_coef | 0.0227 |
| ent_coef_loss | -145 |
| learning_rate | 0.0001 |
| n_updates | 40135 |
| reward | -9.593834 |
----------------------------------
{'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001}
Using cpu device
Logging to results/td3
-----------------------------------
| time/ | |
| episodes | 4 |
| fps | 24 |
| time_elapsed | 544 |
| total_timesteps | 13412 |
| train/ | |
| actor_loss | 132 |
| critic_loss | 6.19e+03 |
| learning_rate | 0.001 |
| n_updates | 10059 |
| reward | -2.3487854 |
-----------------------------------
-----------------------------------
| time/ | |
| episodes | 8 |
| fps | 21 |
| time_elapsed | 1242 |
| total_timesteps | 26824 |
| train/ | |
| actor_loss | 49 |
| critic_loss | 584 |
| learning_rate | 0.001 |
| n_updates | 23471 |
| reward | -2.3487854 |
-----------------------------------
day: 3352, episode: 10
begin_total_asset: 1012427.98
end_total_asset: 5866237.13
total_reward: 4853809.15
total_cost: 1011.41
total_trades: 53632
Sharpe: 0.831
=================================
-----------------------------------
| time/ | |
| episodes | 12 |
| fps | 20 |
| time_elapsed | 1943 |
| total_timesteps | 40236 |
| train/ | |
| actor_loss | 37.3 |
| critic_loss | 101 |
| learning_rate | 0.001 |
| n_updates | 36883 |
| reward | -2.3487854 |
-----------------------------------
hit end!
hit end!
hit end!
hit end!
hit end!
[*********************100%***********************] 1 of 1 completed
Shape of DataFrame: (22, 8)
i: 3
{'n_steps': 5, 'ent_coef': 0.01, 'learning_rate': 0.0007}
Using cpu device
Logging to results/a2c
-------------------------------------
| time/ | |
| fps | 46 |
| iterations | 100 |
| time_elapsed | 10 |
| total_timesteps | 500 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | -0.471 |
| learning_rate | 0.0007 |
| n_updates | 99 |
| policy_loss | 86.2 |
| reward | 1.3343517 |
| std | 1 |
| value_loss | 5.99 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 47 |
| iterations | 200 |
| time_elapsed | 20 |
| total_timesteps | 1000 |
| train/ | |
| entropy_loss | -41.3 |
| explained_variance | -0.271 |
| learning_rate | 0.0007 |
| n_updates | 199 |
| policy_loss | 44.4 |
| reward | -1.4969016 |
| std | 1 |
| value_loss | 1.22 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 47 |
| iterations | 300 |
| time_elapsed | 31 |
| total_timesteps | 1500 |
| train/ | |
| entropy_loss | -41.3 |
| explained_variance | 0.0667 |
| learning_rate | 0.0007 |
| n_updates | 299 |
| policy_loss | 141 |
| reward | -4.3429856 |
| std | 1 |
| value_loss | 15.6 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 51 |
| iterations | 400 |
| time_elapsed | 39 |
| total_timesteps | 2000 |
| train/ | |
| entropy_loss | -41.3 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 399 |
| policy_loss | 28.9 |
| reward | -2.9280229 |
| std | 1.01 |
| value_loss | 0.941 |
--------------------------------------
------------------------------------
| time/ | |
| fps | 54 |
| iterations | 500 |
| time_elapsed | 46 |
| total_timesteps | 2500 |
| train/ | |
| entropy_loss | -41.2 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 499 |
| policy_loss | -356 |
| reward | 2.440834 |
| std | 1 |
| value_loss | 95.7 |
------------------------------------
-------------------------------------
| time/ | |
| fps | 52 |
| iterations | 600 |
| time_elapsed | 56 |
| total_timesteps | 3000 |
| train/ | |
| entropy_loss | -41.3 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 599 |
| policy_loss | -187 |
| reward | 7.7011724 |
| std | 1.01 |
| value_loss | 30.6 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 55 |
| iterations | 700 |
| time_elapsed | 63 |
| total_timesteps | 3500 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | -0.042 |
| learning_rate | 0.0007 |
| n_updates | 699 |
| policy_loss | 23.3 |
| reward | -1.0782235 |
| std | 1.01 |
| value_loss | 0.963 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 56 |
| iterations | 800 |
| time_elapsed | 71 |
| total_timesteps | 4000 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 799 |
| policy_loss | -258 |
| reward | -0.20911986 |
| std | 1.01 |
| value_loss | 50.4 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 55 |
| iterations | 900 |
| time_elapsed | 81 |
| total_timesteps | 4500 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | 0.118 |
| learning_rate | 0.0007 |
| n_updates | 899 |
| policy_loss | -66.9 |
| reward | 0.8433642 |
| std | 1.01 |
| value_loss | 2.9 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 57 |
| iterations | 1000 |
| time_elapsed | 87 |
| total_timesteps | 5000 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 999 |
| policy_loss | 5.19 |
| reward | -1.4874439 |
| std | 1.01 |
| value_loss | 4.01 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 57 |
| iterations | 1100 |
| time_elapsed | 96 |
| total_timesteps | 5500 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | -0.555 |
| learning_rate | 0.0007 |
| n_updates | 1099 |
| policy_loss | -77.7 |
| reward | 1.8939301 |
| std | 1.01 |
| value_loss | 3.97 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 57 |
| iterations | 1200 |
| time_elapsed | 105 |
| total_timesteps | 6000 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 1199 |
| policy_loss | 187 |
| reward | 2.3026025 |
| std | 1.01 |
| value_loss | 33.4 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 58 |
| iterations | 1300 |
| time_elapsed | 111 |
| total_timesteps | 6500 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 1299 |
| policy_loss | 47.1 |
| reward | 1.9173757 |
| std | 1.01 |
| value_loss | 5.89 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 54 |
| iterations | 1400 |
| time_elapsed | 127 |
| total_timesteps | 7000 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | -0.0644 |
| learning_rate | 0.0007 |
| n_updates | 1399 |
| policy_loss | 32.9 |
| reward | -3.0739012 |
| std | 1.01 |
| value_loss | 1.06 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 55 |
| iterations | 1500 |
| time_elapsed | 134 |
| total_timesteps | 7500 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 1499 |
| policy_loss | 305 |
| reward | 2.5744946 |
| std | 1.01 |
| value_loss | 50.5 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 55 |
| iterations | 1600 |
| time_elapsed | 144 |
| total_timesteps | 8000 |
| train/ | |
| entropy_loss | -41.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 1599 |
| policy_loss | -4.88 |
| reward | -2.1707737 |
| std | 1.01 |
| value_loss | 1.28 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 55 |
| iterations | 1700 |
| time_elapsed | 151 |
| total_timesteps | 8500 |
| train/ | |
| entropy_loss | -41.5 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 1699 |
| policy_loss | 160 |
| reward | -0.81266195 |
| std | 1.01 |
| value_loss | 16 |
---------------------------------------
--------------------------------------
| time/ | |
| fps | 56 |
| iterations | 1800 |
| time_elapsed | 158 |
| total_timesteps | 9000 |
| train/ | |
| entropy_loss | -41.5 |
| explained_variance | -0.208 |
| learning_rate | 0.0007 |
| n_updates | 1799 |
| policy_loss | 38.1 |
| reward | -1.4904355 |
| std | 1.01 |
| value_loss | 5.15 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 56 |
| iterations | 1900 |
| time_elapsed | 169 |
| total_timesteps | 9500 |
| train/ | |
| entropy_loss | -41.5 |
| explained_variance | 1.79e-07 |
| learning_rate | 0.0007 |
| n_updates | 1899 |
| policy_loss | 282 |
| reward | -0.36043915 |
| std | 1.01 |
| value_loss | 56 |
---------------------------------------
---------------------------------------
| time/ | |
| fps | 56 |
| iterations | 2000 |
| time_elapsed | 175 |
| total_timesteps | 10000 |
| train/ | |
| entropy_loss | -41.5 |
| explained_variance | -0.0747 |
| learning_rate | 0.0007 |
| n_updates | 1999 |
| policy_loss | -471 |
| reward | -0.37017918 |
| std | 1.01 |
| value_loss | 171 |
---------------------------------------
--------------------------------------
| time/ | |
| fps | 57 |
| iterations | 2100 |
| time_elapsed | 183 |
| total_timesteps | 10500 |
| train/ | |
| entropy_loss | -41.5 |
| explained_variance | 0.163 |
| learning_rate | 0.0007 |
| n_updates | 2099 |
| policy_loss | 1.74 |
| reward | -1.2063048 |
| std | 1.01 |
| value_loss | 0.28 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 56 |
| iterations | 2200 |
| time_elapsed | 193 |
| total_timesteps | 11000 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | -0.326 |
| learning_rate | 0.0007 |
| n_updates | 2199 |
| policy_loss | -94 |
| reward | 1.8247845 |
| std | 1.01 |
| value_loss | 5.61 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 57 |
| iterations | 2300 |
| time_elapsed | 199 |
| total_timesteps | 11500 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | 0.0682 |
| learning_rate | 0.0007 |
| n_updates | 2299 |
| policy_loss | -128 |
| reward | 0.4869665 |
| std | 1.01 |
| value_loss | 15.7 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 57 |
| iterations | 2400 |
| time_elapsed | 208 |
| total_timesteps | 12000 |
| train/ | |
| entropy_loss | -41.5 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 2399 |
| policy_loss | -173 |
| reward | -0.9164407 |
| std | 1.01 |
| value_loss | 23.1 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 57 |
| iterations | 2500 |
| time_elapsed | 217 |
| total_timesteps | 12500 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 2499 |
| policy_loss | 86.3 |
| reward | 3.2540042 |
| std | 1.02 |
| value_loss | 5.33 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 58 |
| iterations | 2600 |
| time_elapsed | 223 |
| total_timesteps | 13000 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | 1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 2599 |
| policy_loss | 428 |
| reward | 2.4169402 |
| std | 1.02 |
| value_loss | 112 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 57 |
| iterations | 2700 |
| time_elapsed | 233 |
| total_timesteps | 13500 |
| train/ | |
| entropy_loss | -41.6 |
| explained_variance | 0.0601 |
| learning_rate | 0.0007 |
| n_updates | 2699 |
| policy_loss | 1.73 |
| reward | -1.3785244 |
| std | 1.02 |
| value_loss | 0.378 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 58 |
| iterations | 2800 |
| time_elapsed | 241 |
| total_timesteps | 14000 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 2799 |
| policy_loss | 45.3 |
| reward | -1.8347946 |
| std | 1.02 |
| value_loss | 4.25 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 58 |
| iterations | 2900 |
| time_elapsed | 247 |
| total_timesteps | 14500 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 2899 |
| policy_loss | -49.6 |
| reward | 0.13086061 |
| std | 1.02 |
| value_loss | 1.75 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 58 |
| iterations | 3000 |
| time_elapsed | 258 |
| total_timesteps | 15000 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | -0.104 |
| learning_rate | 0.0007 |
| n_updates | 2999 |
| policy_loss | -51.1 |
| reward | -2.9340496 |
| std | 1.02 |
| value_loss | 1.92 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 58 |
| iterations | 3100 |
| time_elapsed | 266 |
| total_timesteps | 15500 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 3099 |
| policy_loss | -96 |
| reward | 5.6104155 |
| std | 1.02 |
| value_loss | 7.44 |
-------------------------------------
------------------------------------
| time/ | |
| fps | 58 |
| iterations | 3200 |
| time_elapsed | 273 |
| total_timesteps | 16000 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3199 |
| policy_loss | 288 |
| reward | 4.10712 |
| std | 1.02 |
| value_loss | 56 |
------------------------------------
--------------------------------------
| time/ | |
| fps | 58 |
| iterations | 3300 |
| time_elapsed | 283 |
| total_timesteps | 16500 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 3299 |
| policy_loss | 29.9 |
| reward | 0.10846165 |
| std | 1.02 |
| value_loss | 6.75 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 58 |
| iterations | 3400 |
| time_elapsed | 290 |
| total_timesteps | 17000 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3399 |
| policy_loss | -128 |
| reward | -0.26822066 |
| std | 1.02 |
| value_loss | 12.3 |
---------------------------------------
---------------------------------------
| time/ | |
| fps | 58 |
| iterations | 3500 |
| time_elapsed | 298 |
| total_timesteps | 17500 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3499 |
| policy_loss | 23.3 |
| reward | 0.012110213 |
| std | 1.02 |
| value_loss | 0.832 |
---------------------------------------
--------------------------------------
| time/ | |
| fps | 58 |
| iterations | 3600 |
| time_elapsed | 308 |
| total_timesteps | 18000 |
| train/ | |
| entropy_loss | -41.8 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3599 |
| policy_loss | -94.5 |
| reward | -0.6443226 |
| std | 1.02 |
| value_loss | 11.1 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 58 |
| iterations | 3700 |
| time_elapsed | 314 |
| total_timesteps | 18500 |
| train/ | |
| entropy_loss | -41.8 |
| explained_variance | 1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 3699 |
| policy_loss | -16.7 |
| reward | 1.8698422 |
| std | 1.02 |
| value_loss | 0.374 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 58 |
| iterations | 3800 |
| time_elapsed | 323 |
| total_timesteps | 19000 |
| train/ | |
| entropy_loss | -41.8 |
| explained_variance | 1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 3799 |
| policy_loss | 166 |
| reward | -1.3664656 |
| std | 1.02 |
| value_loss | 19.5 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 58 |
| iterations | 3900 |
| time_elapsed | 332 |
| total_timesteps | 19500 |
| train/ | |
| entropy_loss | -41.7 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3899 |
| policy_loss | 43.9 |
| reward | -1.1592114 |
| std | 1.02 |
| value_loss | 2.46 |
--------------------------------------
------------------------------------
| time/ | |
| fps | 58 |
| iterations | 4000 |
| time_elapsed | 339 |
| total_timesteps | 20000 |
| train/ | |
| entropy_loss | -41.8 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 3999 |
| policy_loss | -31.6 |
| reward | 1.018338 |
| std | 1.02 |
| value_loss | 0.683 |
------------------------------------
--------------------------------------
| time/ | |
| fps | 58 |
| iterations | 4100 |
| time_elapsed | 348 |
| total_timesteps | 20500 |
| train/ | |
| entropy_loss | -41.8 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 4099 |
| policy_loss | -21.4 |
| reward | 0.26098472 |
| std | 1.02 |
| value_loss | 0.295 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 58 |
| iterations | 4200 |
| time_elapsed | 356 |
| total_timesteps | 21000 |
| train/ | |
| entropy_loss | -41.8 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 4199 |
| policy_loss | 37.3 |
| reward | 2.0496662 |
| std | 1.02 |
| value_loss | 1.24 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 4300 |
| time_elapsed | 362 |
| total_timesteps | 21500 |
| train/ | |
| entropy_loss | -41.9 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 4299 |
| policy_loss | 21.7 |
| reward | 0.5919729 |
| std | 1.03 |
| value_loss | 0.614 |
-------------------------------------
---------------------------------------
| time/ | |
| fps | 58 |
| iterations | 4400 |
| time_elapsed | 373 |
| total_timesteps | 22000 |
| train/ | |
| entropy_loss | -41.9 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 4399 |
| policy_loss | -59.5 |
| reward | -0.44648832 |
| std | 1.03 |
| value_loss | 2.35 |
---------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 4500 |
| time_elapsed | 380 |
| total_timesteps | 22500 |
| train/ | |
| entropy_loss | -41.9 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 4499 |
| policy_loss | 75.7 |
| reward | -1.7295737 |
| std | 1.03 |
| value_loss | 5.7 |
--------------------------------------
------------------------------------
| time/ | |
| fps | 59 |
| iterations | 4600 |
| time_elapsed | 387 |
| total_timesteps | 23000 |
| train/ | |
| entropy_loss | -41.9 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 4599 |
| policy_loss | -194 |
| reward | -2.21535 |
| std | 1.03 |
| value_loss | 37.3 |
------------------------------------
--------------------------------------
| time/ | |
| fps | 58 |
| iterations | 4700 |
| time_elapsed | 398 |
| total_timesteps | 23500 |
| train/ | |
| entropy_loss | -42 |
| explained_variance | -0.0141 |
| learning_rate | 0.0007 |
| n_updates | 4699 |
| policy_loss | -32.7 |
| reward | 0.16243774 |
| std | 1.03 |
| value_loss | 1.75 |
--------------------------------------
------------------------------------
| time/ | |
| fps | 59 |
| iterations | 4800 |
| time_elapsed | 404 |
| total_timesteps | 24000 |
| train/ | |
| entropy_loss | -42 |
| explained_variance | 0.168 |
| learning_rate | 0.0007 |
| n_updates | 4799 |
| policy_loss | -61.7 |
| reward | 0.961177 |
| std | 1.03 |
| value_loss | 2.67 |
------------------------------------
------------------------------------
| time/ | |
| fps | 59 |
| iterations | 4900 |
| time_elapsed | 412 |
| total_timesteps | 24500 |
| train/ | |
| entropy_loss | -42 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 4899 |
| policy_loss | -54.1 |
| reward | 3.000443 |
| std | 1.03 |
| value_loss | 2.52 |
------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 5000 |
| time_elapsed | 422 |
| total_timesteps | 25000 |
| train/ | |
| entropy_loss | -42 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 4999 |
| policy_loss | 75.7 |
| reward | 0.7883484 |
| std | 1.03 |
| value_loss | 6.61 |
-------------------------------------
---------------------------------------
| time/ | |
| fps | 59 |
| iterations | 5100 |
| time_elapsed | 428 |
| total_timesteps | 25500 |
| train/ | |
| entropy_loss | -42.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 5099 |
| policy_loss | 237 |
| reward | -0.49083808 |
| std | 1.03 |
| value_loss | 39.1 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 5200 |
| time_elapsed | 437 |
| total_timesteps | 26000 |
| train/ | |
| entropy_loss | -42.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 5199 |
| policy_loss | 152 |
| reward | 2.7196112 |
| std | 1.03 |
| value_loss | 16.1 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 5300 |
| time_elapsed | 445 |
| total_timesteps | 26500 |
| train/ | |
| entropy_loss | -42.1 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 5299 |
| policy_loss | -317 |
| reward | 0.59174556 |
| std | 1.03 |
| value_loss | 63.7 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 5400 |
| time_elapsed | 452 |
| total_timesteps | 27000 |
| train/ | |
| entropy_loss | -42.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 5399 |
| policy_loss | -126 |
| reward | 0.06384493 |
| std | 1.03 |
| value_loss | 9.43 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 5500 |
| time_elapsed | 461 |
| total_timesteps | 27500 |
| train/ | |
| entropy_loss | -42.1 |
| explained_variance | 1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 5499 |
| policy_loss | -11.3 |
| reward | -1.1629822 |
| std | 1.03 |
| value_loss | 0.213 |
--------------------------------------
------------------------------------
| time/ | |
| fps | 59 |
| iterations | 5600 |
| time_elapsed | 469 |
| total_timesteps | 28000 |
| train/ | |
| entropy_loss | -42.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 5599 |
| policy_loss | 91 |
| reward | 1.35537 |
| std | 1.03 |
| value_loss | 5.83 |
------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 5700 |
| time_elapsed | 476 |
| total_timesteps | 28500 |
| train/ | |
| entropy_loss | -42.1 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 5699 |
| policy_loss | -18.6 |
| reward | -2.177703 |
| std | 1.03 |
| value_loss | 0.358 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 5800 |
| time_elapsed | 487 |
| total_timesteps | 29000 |
| train/ | |
| entropy_loss | -42 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 5799 |
| policy_loss | -36.6 |
| reward | -2.1937134 |
| std | 1.03 |
| value_loss | 2.54 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 5900 |
| time_elapsed | 497 |
| total_timesteps | 29500 |
| train/ | |
| entropy_loss | -42 |
| explained_variance | 1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 5899 |
| policy_loss | -94.3 |
| reward | -1.7350562 |
| std | 1.03 |
| value_loss | 7.48 |
--------------------------------------
day: 3330, episode: 10
begin_total_asset: 952508.66
end_total_asset: 4088694.53
total_reward: 3136185.87
total_cost: 3157.22
total_trades: 58734
Sharpe: 0.733
=================================
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 6000 |
| time_elapsed | 507 |
| total_timesteps | 30000 |
| train/ | |
| entropy_loss | -42 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 5999 |
| policy_loss | 9.33 |
| reward | 1.7072018 |
| std | 1.03 |
| value_loss | 0.168 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 6100 |
| time_elapsed | 515 |
| total_timesteps | 30500 |
| train/ | |
| entropy_loss | -42 |
| explained_variance | 0.137 |
| learning_rate | 0.0007 |
| n_updates | 6099 |
| policy_loss | 86.1 |
| reward | 0.23781453 |
| std | 1.03 |
| value_loss | 5.84 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 59 |
| iterations | 6200 |
| time_elapsed | 522 |
| total_timesteps | 31000 |
| train/ | |
| entropy_loss | -42 |
| explained_variance | 1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 6199 |
| policy_loss | 81.6 |
| reward | -0.55448675 |
| std | 1.03 |
| value_loss | 4.51 |
---------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 6300 |
| time_elapsed | 532 |
| total_timesteps | 31500 |
| train/ | |
| entropy_loss | -42.1 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 6299 |
| policy_loss | -123 |
| reward | 0.53070265 |
| std | 1.03 |
| value_loss | 10.1 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 6400 |
| time_elapsed | 539 |
| total_timesteps | 32000 |
| train/ | |
| entropy_loss | -42.2 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 6399 |
| policy_loss | -35.1 |
| reward | -0.7190698 |
| std | 1.04 |
| value_loss | 0.746 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 59 |
| iterations | 6500 |
| time_elapsed | 547 |
| total_timesteps | 32500 |
| train/ | |
| entropy_loss | -42.1 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 6499 |
| policy_loss | -195 |
| reward | -0.20805828 |
| std | 1.04 |
| value_loss | 24.3 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 6600 |
| time_elapsed | 557 |
| total_timesteps | 33000 |
| train/ | |
| entropy_loss | -42.1 |
| explained_variance | 0.0285 |
| learning_rate | 0.0007 |
| n_updates | 6599 |
| policy_loss | -113 |
| reward | -2.668644 |
| std | 1.04 |
| value_loss | 13.9 |
-------------------------------------
----------------------------------------
| time/ | |
| fps | 59 |
| iterations | 6700 |
| time_elapsed | 563 |
| total_timesteps | 33500 |
| train/ | |
| entropy_loss | -42.2 |
| explained_variance | -0.603 |
| learning_rate | 0.0007 |
| n_updates | 6699 |
| policy_loss | -39.6 |
| reward | -0.083356254 |
| std | 1.04 |
| value_loss | 0.818 |
----------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 6800 |
| time_elapsed | 572 |
| total_timesteps | 34000 |
| train/ | |
| entropy_loss | -42.2 |
| explained_variance | 0.0184 |
| learning_rate | 0.0007 |
| n_updates | 6799 |
| policy_loss | 86.5 |
| reward | 0.6618178 |
| std | 1.04 |
| value_loss | 5.35 |
-------------------------------------
---------------------------------------
| time/ | |
| fps | 59 |
| iterations | 6900 |
| time_elapsed | 581 |
| total_timesteps | 34500 |
| train/ | |
| entropy_loss | -42.2 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 6899 |
| policy_loss | 56.7 |
| reward | 0.052872755 |
| std | 1.04 |
| value_loss | 2.85 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 7000 |
| time_elapsed | 587 |
| total_timesteps | 35000 |
| train/ | |
| entropy_loss | -42.3 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 6999 |
| policy_loss | 197 |
| reward | 1.6442178 |
| std | 1.04 |
| value_loss | 26.3 |
-------------------------------------
---------------------------------------
| time/ | |
| fps | 59 |
| iterations | 7100 |
| time_elapsed | 597 |
| total_timesteps | 35500 |
| train/ | |
| entropy_loss | -42.3 |
| explained_variance | -0.0238 |
| learning_rate | 0.0007 |
| n_updates | 7099 |
| policy_loss | 39.7 |
| reward | -0.16224274 |
| std | 1.04 |
| value_loss | 1.43 |
---------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 7200 |
| time_elapsed | 605 |
| total_timesteps | 36000 |
| train/ | |
| entropy_loss | -42.3 |
| explained_variance | 0.03 |
| learning_rate | 0.0007 |
| n_updates | 7199 |
| policy_loss | 139 |
| reward | -0.1674491 |
| std | 1.04 |
| value_loss | 11.7 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 7300 |
| time_elapsed | 611 |
| total_timesteps | 36500 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | -0.0288 |
| learning_rate | 0.0007 |
| n_updates | 7299 |
| policy_loss | -406 |
| reward | 2.2645469 |
| std | 1.04 |
| value_loss | 134 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 7400 |
| time_elapsed | 622 |
| total_timesteps | 37000 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | 0.0351 |
| learning_rate | 0.0007 |
| n_updates | 7399 |
| policy_loss | 73.6 |
| reward | 0.30078474 |
| std | 1.04 |
| value_loss | 3.6 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 7500 |
| time_elapsed | 629 |
| total_timesteps | 37500 |
| train/ | |
| entropy_loss | -42.3 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 7499 |
| policy_loss | -8.08 |
| reward | -0.3665664 |
| std | 1.04 |
| value_loss | 0.214 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 59 |
| iterations | 7600 |
| time_elapsed | 636 |
| total_timesteps | 38000 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 7599 |
| policy_loss | 42.9 |
| reward | -0.79383886 |
| std | 1.04 |
| value_loss | 1.9 |
---------------------------------------
----------------------------------------
| time/ | |
| fps | 59 |
| iterations | 7700 |
| time_elapsed | 647 |
| total_timesteps | 38500 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 7699 |
| policy_loss | -104 |
| reward | -0.073217735 |
| std | 1.05 |
| value_loss | 10.3 |
----------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 7800 |
| time_elapsed | 653 |
| total_timesteps | 39000 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 7799 |
| policy_loss | -152 |
| reward | -1.8329335 |
| std | 1.05 |
| value_loss | 16.7 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 7900 |
| time_elapsed | 661 |
| total_timesteps | 39500 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 7899 |
| policy_loss | 144 |
| reward | -0.8008484 |
| std | 1.05 |
| value_loss | 15.8 |
--------------------------------------
----------------------------------------
| time/ | |
| fps | 59 |
| iterations | 8000 |
| time_elapsed | 671 |
| total_timesteps | 40000 |
| train/ | |
| entropy_loss | -42.3 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 7999 |
| policy_loss | -8.53 |
| reward | -0.031915538 |
| std | 1.04 |
| value_loss | 0.0835 |
----------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 8100 |
| time_elapsed | 677 |
| total_timesteps | 40500 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 8099 |
| policy_loss | -69.3 |
| reward | 0.8095603 |
| std | 1.04 |
| value_loss | 3.08 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 8200 |
| time_elapsed | 686 |
| total_timesteps | 41000 |
| train/ | |
| entropy_loss | -42.3 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 8199 |
| policy_loss | -5.2 |
| reward | -0.5655167 |
| std | 1.04 |
| value_loss | 0.69 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 8300 |
| time_elapsed | 695 |
| total_timesteps | 41500 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 8299 |
| policy_loss | -29.8 |
| reward | -1.5929188 |
| std | 1.04 |
| value_loss | 0.672 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 59 |
| iterations | 8400 |
| time_elapsed | 701 |
| total_timesteps | 42000 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 8399 |
| policy_loss | -54.2 |
| reward | -0.53150016 |
| std | 1.05 |
| value_loss | 9.5 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 8500 |
| time_elapsed | 711 |
| total_timesteps | 42500 |
| train/ | |
| entropy_loss | -42.5 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 8499 |
| policy_loss | 237 |
| reward | 2.7706447 |
| std | 1.05 |
| value_loss | 42.1 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 8600 |
| time_elapsed | 719 |
| total_timesteps | 43000 |
| train/ | |
| entropy_loss | -42.5 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 8599 |
| policy_loss | -188 |
| reward | 1.1153419 |
| std | 1.05 |
| value_loss | 21.7 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 8700 |
| time_elapsed | 730 |
| total_timesteps | 43500 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 8699 |
| policy_loss | -17.4 |
| reward | -0.5148427 |
| std | 1.05 |
| value_loss | 0.297 |
--------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 8800 |
| time_elapsed | 740 |
| total_timesteps | 44000 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | 1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 8799 |
| policy_loss | 41.4 |
| reward | 0.32814896 |
| std | 1.05 |
| value_loss | 1.6 |
--------------------------------------
---------------------------------------
| time/ | |
| fps | 59 |
| iterations | 8900 |
| time_elapsed | 746 |
| total_timesteps | 44500 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 8899 |
| policy_loss | -47.2 |
| reward | -0.17413093 |
| std | 1.05 |
| value_loss | 1.75 |
---------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 9000 |
| time_elapsed | 755 |
| total_timesteps | 45000 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 8999 |
| policy_loss | 65.1 |
| reward | 0.38266626 |
| std | 1.05 |
| value_loss | 6.64 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 9100 |
| time_elapsed | 764 |
| total_timesteps | 45500 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 9099 |
| policy_loss | 31.2 |
| reward | 1.3317974 |
| std | 1.05 |
| value_loss | 0.927 |
-------------------------------------
---------------------------------------
| time/ | |
| fps | 59 |
| iterations | 9200 |
| time_elapsed | 770 |
| total_timesteps | 46000 |
| train/ | |
| entropy_loss | -42.4 |
| explained_variance | -0.0927 |
| learning_rate | 0.0007 |
| n_updates | 9199 |
| policy_loss | 181 |
| reward | -0.49035767 |
| std | 1.05 |
| value_loss | 21.3 |
---------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 9300 |
| time_elapsed | 780 |
| total_timesteps | 46500 |
| train/ | |
| entropy_loss | -42.5 |
| explained_variance | 1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 9299 |
| policy_loss | 148 |
| reward | -8.756936 |
| std | 1.05 |
| value_loss | 30.5 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 9400 |
| time_elapsed | 788 |
| total_timesteps | 47000 |
| train/ | |
| entropy_loss | -42.5 |
| explained_variance | -0.066 |
| learning_rate | 0.0007 |
| n_updates | 9399 |
| policy_loss | 40.5 |
| reward | 0.5117786 |
| std | 1.05 |
| value_loss | 1.51 |
-------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 9500 |
| time_elapsed | 794 |
| total_timesteps | 47500 |
| train/ | |
| entropy_loss | -42.5 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 9499 |
| policy_loss | 46.4 |
| reward | 1.5631902 |
| std | 1.05 |
| value_loss | 1.37 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 9600 |
| time_elapsed | 804 |
| total_timesteps | 48000 |
| train/ | |
| entropy_loss | -42.5 |
| explained_variance | 5.96e-08 |
| learning_rate | 0.0007 |
| n_updates | 9599 |
| policy_loss | 4.73 |
| reward | -0.8106855 |
| std | 1.05 |
| value_loss | 0.346 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 9700 |
| time_elapsed | 811 |
| total_timesteps | 48500 |
| train/ | |
| entropy_loss | -42.5 |
| explained_variance | -1.19e-07 |
| learning_rate | 0.0007 |
| n_updates | 9699 |
| policy_loss | 60.8 |
| reward | 1.219504 |
| std | 1.05 |
| value_loss | 3.44 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 9800 |
| time_elapsed | 818 |
| total_timesteps | 49000 |
| train/ | |
| entropy_loss | -42.5 |
| explained_variance | 0.00147 |
| learning_rate | 0.0007 |
| n_updates | 9799 |
| policy_loss | -19 |
| reward | 0.36547118 |
| std | 1.05 |
| value_loss | 6.68 |
--------------------------------------
-------------------------------------
| time/ | |
| fps | 59 |
| iterations | 9900 |
| time_elapsed | 829 |
| total_timesteps | 49500 |
| train/ | |
| entropy_loss | -42.5 |
| explained_variance | -0.0611 |
| learning_rate | 0.0007 |
| n_updates | 9899 |
| policy_loss | -14 |
| reward | 1.2229353 |
| std | 1.05 |
| value_loss | 2.29 |
-------------------------------------
--------------------------------------
| time/ | |
| fps | 59 |
| iterations | 10000 |
| time_elapsed | 835 |
| total_timesteps | 50000 |
| train/ | |
| entropy_loss | -42.6 |
| explained_variance | 0 |
| learning_rate | 0.0007 |
| n_updates | 9999 |
| policy_loss | -15.6 |
| reward | 0.31784078 |
| std | 1.05 |
| value_loss | 0.296 |
--------------------------------------
{'batch_size': 128, 'buffer_size': 50000, 'learning_rate': 0.001}
Using cpu device
Logging to results/ddpg
-----------------------------------
| time/ | |
| episodes | 4 |
| fps | 23 |
| time_elapsed | 556 |
| total_timesteps | 13324 |
| train/ | |
| actor_loss | 20.3 |
| critic_loss | 66.8 |
| learning_rate | 0.001 |
| n_updates | 9993 |
| reward | -4.5011277 |
-----------------------------------
-----------------------------------
| time/ | |
| episodes | 8 |
| fps | 21 |
| time_elapsed | 1250 |
| total_timesteps | 26648 |
| train/ | |
| actor_loss | 2.62 |
| critic_loss | 9.79 |
| learning_rate | 0.001 |
| n_updates | 23317 |
| reward | -4.5011277 |
-----------------------------------
day: 3330, episode: 10
begin_total_asset: 965326.95
end_total_asset: 3940368.63
total_reward: 2975041.68
total_cost: 964.36
total_trades: 53280
Sharpe: 0.657
=================================
-----------------------------------
| time/ | |
| episodes | 12 |
| fps | 20 |
| time_elapsed | 1944 |
| total_timesteps | 39972 |
| train/ | |
| actor_loss | -3.63 |
| critic_loss | 2.39 |
| learning_rate | 0.001 |
| n_updates | 36641 |
| reward | -4.5011277 |
-----------------------------------
-----------------------------------
| time/ | |
| episodes | 16 |
| fps | 20 |
| time_elapsed | 2656 |
| total_timesteps | 53296 |
| train/ | |
| actor_loss | -6.92 |
| critic_loss | 1.51 |
| learning_rate | 0.001 |
| n_updates | 49965 |
| reward | -4.5011277 |
-----------------------------------
{'n_steps': 2048, 'ent_coef': 0.01, 'learning_rate': 0.00025, 'batch_size': 128}
Using cpu device
Logging to results/ppo
-----------------------------------
| time/ | |
| fps | 70 |
| iterations | 1 |
| time_elapsed | 29 |
| total_timesteps | 2048 |
| train/ | |
| reward | -0.3290882 |
-----------------------------------
-----------------------------------------
| time/ | |
| fps | 67 |
| iterations | 2 |
| time_elapsed | 60 |
| total_timesteps | 4096 |
| train/ | |
| approx_kl | 0.019916927 |
| clip_fraction | 0.207 |
| clip_range | 0.2 |
| entropy_loss | -41.2 |
| explained_variance | -0.00611 |
| learning_rate | 0.00025 |
| loss | 6.42 |
| n_updates | 10 |
| policy_gradient_loss | -0.0268 |
| reward | 0.84259444 |
| std | 1 |
| value_loss | 15 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 65 |
| iterations | 3 |
| time_elapsed | 93 |
| total_timesteps | 6144 |
| train/ | |
| approx_kl | 0.016416349 |
| clip_fraction | 0.211 |
| clip_range | 0.2 |
| entropy_loss | -41.3 |
| explained_variance | 0.00243 |
| learning_rate | 0.00025 |
| loss | 71.4 |
| n_updates | 20 |
| policy_gradient_loss | -0.0189 |
| reward | -22.102169 |
| std | 1.01 |
| value_loss | 95.2 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 65 |
| iterations | 4 |
| time_elapsed | 125 |
| total_timesteps | 8192 |
| train/ | |
| approx_kl | 0.016711425 |
| clip_fraction | 0.152 |
| clip_range | 0.2 |
| entropy_loss | -41.3 |
| explained_variance | -0.0235 |
| learning_rate | 0.00025 |
| loss | 19.2 |
| n_updates | 30 |
| policy_gradient_loss | -0.0181 |
| reward | 0.8641611 |
| std | 1.01 |
| value_loss | 51 |
-----------------------------------------
----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 5 |
| time_elapsed | 158 |
| total_timesteps | 10240 |
| train/ | |
| approx_kl | 0.02179965 |
| clip_fraction | 0.258 |
| clip_range | 0.2 |
| entropy_loss | -41.3 |
| explained_variance | -0.00376 |
| learning_rate | 0.00025 |
| loss | 24.8 |
| n_updates | 40 |
| policy_gradient_loss | -0.0161 |
| reward | 0.7124557 |
| std | 1.01 |
| value_loss | 37.7 |
----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 6 |
| time_elapsed | 189 |
| total_timesteps | 12288 |
| train/ | |
| approx_kl | 0.020254686 |
| clip_fraction | 0.206 |
| clip_range | 0.2 |
| entropy_loss | -41.4 |
| explained_variance | -0.02 |
| learning_rate | 0.00025 |
| loss | 15.9 |
| n_updates | 50 |
| policy_gradient_loss | -0.0192 |
| reward | 2.9676142 |
| std | 1.01 |
| value_loss | 56 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 7 |
| time_elapsed | 221 |
| total_timesteps | 14336 |
| train/ | |
| approx_kl | 0.015349641 |
| clip_fraction | 0.182 |
| clip_range | 0.2 |
| entropy_loss | -41.5 |
| explained_variance | 0.00714 |
| learning_rate | 0.00025 |
| loss | 7.18 |
| n_updates | 60 |
| policy_gradient_loss | -0.0222 |
| reward | -1.0227845 |
| std | 1.01 |
| value_loss | 12.5 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 8 |
| time_elapsed | 254 |
| total_timesteps | 16384 |
| train/ | |
| approx_kl | 0.020761559 |
| clip_fraction | 0.231 |
| clip_range | 0.2 |
| entropy_loss | -41.5 |
| explained_variance | -0.00857 |
| learning_rate | 0.00025 |
| loss | 25.2 |
| n_updates | 70 |
| policy_gradient_loss | -0.0199 |
| reward | 0.80425155 |
| std | 1.01 |
| value_loss | 57.8 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 9 |
| time_elapsed | 283 |
| total_timesteps | 18432 |
| train/ | |
| approx_kl | 0.018122694 |
| clip_fraction | 0.236 |
| clip_range | 0.2 |
| entropy_loss | -41.5 |
| explained_variance | 0.00296 |
| learning_rate | 0.00025 |
| loss | 28.1 |
| n_updates | 80 |
| policy_gradient_loss | -0.0166 |
| reward | -1.42386 |
| std | 1.01 |
| value_loss | 57.8 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 10 |
| time_elapsed | 318 |
| total_timesteps | 20480 |
| train/ | |
| approx_kl | 0.022673171 |
| clip_fraction | 0.205 |
| clip_range | 0.2 |
| entropy_loss | -41.6 |
| explained_variance | -0.013 |
| learning_rate | 0.00025 |
| loss | 17.3 |
| n_updates | 90 |
| policy_gradient_loss | -0.0191 |
| reward | 0.8197509 |
| std | 1.02 |
| value_loss | 44.3 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 11 |
| time_elapsed | 352 |
| total_timesteps | 22528 |
| train/ | |
| approx_kl | 0.020850785 |
| clip_fraction | 0.214 |
| clip_range | 0.2 |
| entropy_loss | -41.6 |
| explained_variance | -0.00669 |
| learning_rate | 0.00025 |
| loss | 48.4 |
| n_updates | 100 |
| policy_gradient_loss | -0.0161 |
| reward | 1.2033767 |
| std | 1.02 |
| value_loss | 99.1 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 12 |
| time_elapsed | 384 |
| total_timesteps | 24576 |
| train/ | |
| approx_kl | 0.024814304 |
| clip_fraction | 0.251 |
| clip_range | 0.2 |
| entropy_loss | -41.6 |
| explained_variance | -0.0225 |
| learning_rate | 0.00025 |
| loss | 10.8 |
| n_updates | 110 |
| policy_gradient_loss | -0.018 |
| reward | 1.610058 |
| std | 1.02 |
| value_loss | 22.5 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 13 |
| time_elapsed | 416 |
| total_timesteps | 26624 |
| train/ | |
| approx_kl | 0.017855735 |
| clip_fraction | 0.173 |
| clip_range | 0.2 |
| entropy_loss | -41.7 |
| explained_variance | 0.00501 |
| learning_rate | 0.00025 |
| loss | 34.5 |
| n_updates | 120 |
| policy_gradient_loss | -0.0189 |
| reward | 7.162905 |
| std | 1.02 |
| value_loss | 112 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 14 |
| time_elapsed | 446 |
| total_timesteps | 28672 |
| train/ | |
| approx_kl | 0.018644353 |
| clip_fraction | 0.153 |
| clip_range | 0.2 |
| entropy_loss | -41.7 |
| explained_variance | 0.0117 |
| learning_rate | 0.00025 |
| loss | 16.7 |
| n_updates | 130 |
| policy_gradient_loss | -0.0172 |
| reward | 2.0473788 |
| std | 1.02 |
| value_loss | 53.3 |
-----------------------------------------
day: 3330, episode: 10
begin_total_asset: 994554.41
end_total_asset: 4699503.39
total_reward: 3704948.98
total_cost: 439274.68
total_trades: 90096
Sharpe: 0.806
=================================
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 15 |
| time_elapsed | 480 |
| total_timesteps | 30720 |
| train/ | |
| approx_kl | 0.02508668 |
| clip_fraction | 0.25 |
| clip_range | 0.2 |
| entropy_loss | -41.8 |
| explained_variance | -0.0505 |
| learning_rate | 0.00025 |
| loss | 4.42 |
| n_updates | 140 |
| policy_gradient_loss | -0.0173 |
| reward | -0.36127353 |
| std | 1.02 |
| value_loss | 14.8 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 16 |
| time_elapsed | 510 |
| total_timesteps | 32768 |
| train/ | |
| approx_kl | 0.021448491 |
| clip_fraction | 0.211 |
| clip_range | 0.2 |
| entropy_loss | -41.8 |
| explained_variance | 0.00132 |
| learning_rate | 0.00025 |
| loss | 38 |
| n_updates | 150 |
| policy_gradient_loss | -0.00894 |
| reward | -2.4289682 |
| std | 1.02 |
| value_loss | 88 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 17 |
| time_elapsed | 542 |
| total_timesteps | 34816 |
| train/ | |
| approx_kl | 0.02103462 |
| clip_fraction | 0.208 |
| clip_range | 0.2 |
| entropy_loss | -41.8 |
| explained_variance | -0.0246 |
| learning_rate | 0.00025 |
| loss | 35.3 |
| n_updates | 160 |
| policy_gradient_loss | -0.0134 |
| reward | -0.71985894 |
| std | 1.02 |
| value_loss | 54.5 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 18 |
| time_elapsed | 577 |
| total_timesteps | 36864 |
| train/ | |
| approx_kl | 0.022089712 |
| clip_fraction | 0.213 |
| clip_range | 0.2 |
| entropy_loss | -41.9 |
| explained_variance | -0.0028 |
| learning_rate | 0.00025 |
| loss | 27.9 |
| n_updates | 170 |
| policy_gradient_loss | -0.0207 |
| reward | 0.11034006 |
| std | 1.03 |
| value_loss | 39.3 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 19 |
| time_elapsed | 609 |
| total_timesteps | 38912 |
| train/ | |
| approx_kl | 0.014264661 |
| clip_fraction | 0.126 |
| clip_range | 0.2 |
| entropy_loss | -41.9 |
| explained_variance | -0.00283 |
| learning_rate | 0.00025 |
| loss | 58.6 |
| n_updates | 180 |
| policy_gradient_loss | -0.0135 |
| reward | 6.176509 |
| std | 1.03 |
| value_loss | 119 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 20 |
| time_elapsed | 642 |
| total_timesteps | 40960 |
| train/ | |
| approx_kl | 0.027180977 |
| clip_fraction | 0.292 |
| clip_range | 0.2 |
| entropy_loss | -42 |
| explained_variance | 0.0421 |
| learning_rate | 0.00025 |
| loss | 8.9 |
| n_updates | 190 |
| policy_gradient_loss | -0.0156 |
| reward | 0.20096779 |
| std | 1.03 |
| value_loss | 19.8 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 21 |
| time_elapsed | 671 |
| total_timesteps | 43008 |
| train/ | |
| approx_kl | 0.021884244 |
| clip_fraction | 0.205 |
| clip_range | 0.2 |
| entropy_loss | -42 |
| explained_variance | -0.00219 |
| learning_rate | 0.00025 |
| loss | 53.4 |
| n_updates | 200 |
| policy_gradient_loss | -0.0145 |
| reward | -0.839949 |
| std | 1.03 |
| value_loss | 94.3 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 22 |
| time_elapsed | 706 |
| total_timesteps | 45056 |
| train/ | |
| approx_kl | 0.024635753 |
| clip_fraction | 0.235 |
| clip_range | 0.2 |
| entropy_loss | -42.1 |
| explained_variance | -0.00329 |
| learning_rate | 0.00025 |
| loss | 27.6 |
| n_updates | 210 |
| policy_gradient_loss | -0.0148 |
| reward | -0.21918707 |
| std | 1.04 |
| value_loss | 61.8 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 64 |
| iterations | 23 |
| time_elapsed | 735 |
| total_timesteps | 47104 |
| train/ | |
| approx_kl | 0.038902897 |
| clip_fraction | 0.28 |
| clip_range | 0.2 |
| entropy_loss | -42.2 |
| explained_variance | -0.0241 |
| learning_rate | 0.00025 |
| loss | 21.9 |
| n_updates | 220 |
| policy_gradient_loss | -0.0178 |
| reward | -0.12725857 |
| std | 1.04 |
| value_loss | 34.3 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 24 |
| time_elapsed | 768 |
| total_timesteps | 49152 |
| train/ | |
| approx_kl | 0.017998032 |
| clip_fraction | 0.174 |
| clip_range | 0.2 |
| entropy_loss | -42.2 |
| explained_variance | 0.0111 |
| learning_rate | 0.00025 |
| loss | 27.9 |
| n_updates | 230 |
| policy_gradient_loss | -0.0148 |
| reward | 1.7231001 |
| std | 1.04 |
| value_loss | 65.2 |
-----------------------------------------
-----------------------------------------
| time/ | |
| fps | 63 |
| iterations | 25 |
| time_elapsed | 804 |
| total_timesteps | 51200 |
| train/ | |
| approx_kl | 0.017844416 |
| clip_fraction | 0.186 |
| clip_range | 0.2 |
| entropy_loss | -42.3 |
| explained_variance | 0.0211 |
| learning_rate | 0.00025 |
| loss | 13.5 |
| n_updates | 240 |
| policy_gradient_loss | -0.0149 |
| reward | -1.0208522 |
| std | 1.04 |
| value_loss | 35.2 |
-----------------------------------------
{'batch_size': 128, 'buffer_size': 100000, 'learning_rate': 0.0001, 'learning_starts': 100, 'ent_coef': 'auto_0.1'}
Using cpu device
Logging to results/sac
-----------------------------------
| time/ | |
| episodes | 4 |
| fps | 18 |
| time_elapsed | 704 |
| total_timesteps | 13324 |
| train/ | |
| actor_loss | 1.1e+03 |
| critic_loss | 642 |
| ent_coef | 0.169 |
| ent_coef_loss | -83.1 |
| learning_rate | 0.0001 |
| n_updates | 13223 |
| reward | -4.2128644 |
-----------------------------------
-----------------------------------
| time/ | |
| episodes | 8 |
| fps | 18 |
| time_elapsed | 1433 |
| total_timesteps | 26648 |
| train/ | |
| actor_loss | 451 |
| critic_loss | 27.5 |
| ent_coef | 0.046 |
| ent_coef_loss | -109 |
| learning_rate | 0.0001 |
| n_updates | 26547 |
| reward | -4.2404695 |
-----------------------------------
day: 3330, episode: 10
begin_total_asset: 953106.81
end_total_asset: 7458866.64
total_reward: 6505759.83
total_cost: 8648.15
total_trades: 59083
Sharpe: 0.842
=================================
----------------------------------
| time/ | |
| episodes | 12 |
| fps | 18 |
| time_elapsed | 2152 |
| total_timesteps | 39972 |
| train/ | |
| actor_loss | 216 |
| critic_loss | 38.6 |
| ent_coef | 0.0127 |
| ent_coef_loss | -102 |
| learning_rate | 0.0001 |
| n_updates | 39871 |
| reward | -3.931381 |
----------------------------------
{'batch_size': 100, 'buffer_size': 1000000, 'learning_rate': 0.001}
Using cpu device
Logging to results/td3
-----------------------------------
| time/ | |
| episodes | 4 |
| fps | 25 |
| time_elapsed | 526 |
| total_timesteps | 13324 |
| train/ | |
| actor_loss | 91.6 |
| critic_loss | 1.45e+03 |
| learning_rate | 0.001 |
| n_updates | 9993 |
| reward | -3.5290053 |
-----------------------------------
-----------------------------------
| time/ | |
| episodes | 8 |
| fps | 22 |
| time_elapsed | 1191 |
| total_timesteps | 26648 |
| train/ | |
| actor_loss | 43.8 |
| critic_loss | 317 |
| learning_rate | 0.001 |
| n_updates | 23317 |
| reward | -3.5290053 |
-----------------------------------
day: 3330, episode: 10
begin_total_asset: 972865.93
end_total_asset: 3563567.55
total_reward: 2590701.62
total_cost: 971.89
total_trades: 46620
Sharpe: 0.648
=================================
-----------------------------------
| time/ | |
| episodes | 12 |
| fps | 21 |
| time_elapsed | 1862 |
| total_timesteps | 39972 |
| train/ | |
| actor_loss | 34.3 |
| critic_loss | 54.4 |
| learning_rate | 0.001 |
| n_updates | 36641 |
| reward | -3.5290053 |
-----------------------------------