training issue #6

philipshurpik · 2018-03-29T02:34:48Z

Hi!
Have a question about training.

After 16 hours of training, I still get average reward 0.
Will be happy if you can explain what can be wrong?
Maybe it's a problem with default setup parameters?

25%|███▊ | 6374997/25000000 [16:18:15<48:41:33, 106.25it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000135, avg_q: -0.001807, avg_ep_r: 0.0000, max_ep_r: 0.0911, min_ep_r: -0.0984, # game: 5000
26%|███▊ | 6399993/25000000 [16:22:11<48:18:45, 106.94it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000135, avg_q: -0.001469, avg_ep_r: 0.0000, max_ep_r: 0.0985, min_ep_r: -0.0641, # game: 5000
26%|███▊ | 6424989/25000000 [16:26:07<48:06:53, 107.24it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000138, avg_q: -0.001775, avg_ep_r: 0.0001, max_ep_r: 0.1445, min_ep_r: -0.0460, # game: 5000
26%|███▊ | 6449993/25000000 [16:30:03<48:41:25, 105.83it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000134, avg_q: -0.001525, avg_ep_r: -0.0000, max_ep_r: 0.0223, min_ep_r: -0.0371, # game: 5000
26%|███▉ | 6477033/25000000 [16:34:16<47:10:48, 109.06it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000138, avg_q: -0.002763, avg_ep_r: -0.0000, max_ep_r: 0.0302, min_ep_r: -0.0762, # game: 5000
26%|███▉ | 6499197/25000000 [16:37:41<47:10:50, 108.92it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000142, avg_q: -0.003163, avg_ep_r: 0.0000, max_ep_r: 0.0352, min_ep_r: -0.0225, # game: 5000
26%|███▉ | 6526765/25000000 [16:41:56<47:30:54, 108.00it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000135, avg_q: -0.003114, avg_ep_r: -0.0000, max_ep_r: 0.0253, min_ep_r: -0.1445, # game: 5000
26%|███▉ | 6551381/25000000 [16:45:43<47:47:03, 107.25it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000131, avg_q: -0.002506, avg_ep_r: 0.0000, max_ep_r: 0.0643, min_ep_r: -0.0199, # game: 5000
26%|███▉ | 6577145/25000000 [16:49:41<47:26:52, 107.85it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000137, avg_q: -0.001795, avg_ep_r: -0.0000, max_ep_r: 0.0300, min_ep_r: -0.1185, # game: 5000
26%|███▉ | 6599989/25000000 [16:53:14<46:38:00, 109.60it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000137, avg_q: -0.002334, avg_ep_r: -0.0000, max_ep_r: 0.0495, min_ep_r: -0.1122, # game: 5000

samre12 · 2018-03-29T15:19:49Z

I believe that you are using code from the main branch in which the model contains earlier implementation of both the batch normalization and the dropout layers. There was an issue with the earlier implementation which I had then corrected in the dev branch. Recently I have merged the dev and master branches.
Please use the recent code from the main branch and re-run the training process.
If you wish to remove dropout from the model, set keep_prob for each layer to 1.0 in the configuration file.

philipshurpik · 2018-04-03T09:24:42Z

Thanks, I just checked out latest master branch, and run the code with default config.
The problem still exists - after one day of learning it learns nothing :(

 47%|██████▌       | 11724993/25000000 [22:58:13<26:00:25, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000079, avg_q: 0.027890, avg_ep_r: -0.0001, max_ep_r: 0.2953, min_ep_r: -0.2274, # game: 5000
 47%|██████▌       | 11749993/25000000 [23:01:09<25:57:28, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000077, avg_q: 0.027767, avg_ep_r: -0.0002, max_ep_r: 0.2545, min_ep_r: -0.2279, # game: 5000
 47%|██████▌       | 11774985/25000000 [23:04:05<25:54:32, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000088, avg_q: 0.026840, avg_ep_r: -0.0001, max_ep_r: 0.3711, min_ep_r: -0.6593, # game: 5000
 47%|██████▌       | 11799997/25000000 [23:07:01<25:51:34, 141.79it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000079, avg_q: 0.026633, avg_ep_r: 0.0000, max_ep_r: 0.2962, min_ep_r: -0.2784, # game: 5000

samre12 · 2018-04-05T04:22:50Z

Hi @philipshurpik , can you please share the tensorboard log file if you have generated one, that will help me analyse the training process. Actually I am also fixing this issue right now.
It's probably related to the capacity of the model. You could also try to run the model without dropout by setting keep_prob to 1 for all the layers.

joseph-zhong · 2018-05-08T00:35:13Z

I haven't taken a deep look at the training scheme, but intuitively, something seems wrong when I graph the following Loss and Q-Value

Loss minimization seems good, which perhaps implies that the networks are learning, but the fact that the Q-Value reward is also becoming minimized seems alarming:

samre12 · 2018-05-08T07:05:15Z

@joseph-zhong I have been making corrections in the code, the latest changes are reflected in the dev branch where I have integrated this repo with my gym-cryptotrading environment for ease of use.
With these bugs removals, I am also targeting to remove this specific issue but currently the btc_sim.average.q curve that I am getting is quite different from what you have shown (this curve might result from an earlier commit) but still converges to 0 towards the end of training.
Any help upon this matter is greatly appreciated.

samre12 added the bug label Mar 29, 2018

samre12 added the help wanted label Apr 16, 2018

samre12 self-assigned this May 21, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

training issue #6

training issue #6

philipshurpik commented Mar 29, 2018

samre12 commented Mar 29, 2018

philipshurpik commented Apr 3, 2018

samre12 commented Apr 5, 2018

joseph-zhong commented May 8, 2018

samre12 commented May 8, 2018

training issue #6

training issue #6

Comments

philipshurpik commented Mar 29, 2018

samre12 commented Mar 29, 2018

philipshurpik commented Apr 3, 2018

samre12 commented Apr 5, 2018

joseph-zhong commented May 8, 2018

samre12 commented May 8, 2018