Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

training issue #6

Open
philipshurpik opened this issue Mar 29, 2018 · 5 comments
Open

training issue #6

philipshurpik opened this issue Mar 29, 2018 · 5 comments
Assignees

Comments

@philipshurpik
Copy link

Hi!
Have a question about training.

After 16 hours of training, I still get average reward 0.
Will be happy if you can explain what can be wrong?
Maybe it's a problem with default setup parameters?

25%|███▊ | 6374997/25000000 [16:18:15<48:41:33, 106.25it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000135, avg_q: -0.001807, avg_ep_r: 0.0000, max_ep_r: 0.0911, min_ep_r: -0.0984, # game: 5000
26%|███▊ | 6399993/25000000 [16:22:11<48:18:45, 106.94it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000135, avg_q: -0.001469, avg_ep_r: 0.0000, max_ep_r: 0.0985, min_ep_r: -0.0641, # game: 5000
26%|███▊ | 6424989/25000000 [16:26:07<48:06:53, 107.24it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000138, avg_q: -0.001775, avg_ep_r: 0.0001, max_ep_r: 0.1445, min_ep_r: -0.0460, # game: 5000
26%|███▊ | 6449993/25000000 [16:30:03<48:41:25, 105.83it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000134, avg_q: -0.001525, avg_ep_r: -0.0000, max_ep_r: 0.0223, min_ep_r: -0.0371, # game: 5000
26%|███▉ | 6477033/25000000 [16:34:16<47:10:48, 109.06it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000138, avg_q: -0.002763, avg_ep_r: -0.0000, max_ep_r: 0.0302, min_ep_r: -0.0762, # game: 5000
26%|███▉ | 6499197/25000000 [16:37:41<47:10:50, 108.92it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000142, avg_q: -0.003163, avg_ep_r: 0.0000, max_ep_r: 0.0352, min_ep_r: -0.0225, # game: 5000
26%|███▉ | 6526765/25000000 [16:41:56<47:30:54, 108.00it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000135, avg_q: -0.003114, avg_ep_r: -0.0000, max_ep_r: 0.0253, min_ep_r: -0.1445, # game: 5000
26%|███▉ | 6551381/25000000 [16:45:43<47:47:03, 107.25it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000131, avg_q: -0.002506, avg_ep_r: 0.0000, max_ep_r: 0.0643, min_ep_r: -0.0199, # game: 5000
26%|███▉ | 6577145/25000000 [16:49:41<47:26:52, 107.85it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000137, avg_q: -0.001795, avg_ep_r: -0.0000, max_ep_r: 0.0300, min_ep_r: -0.1185, # game: 5000
26%|███▉ | 6599989/25000000 [16:53:14<46:38:00, 109.60it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000137, avg_q: -0.002334, avg_ep_r: -0.0000, max_ep_r: 0.0495, min_ep_r: -0.1122, # game: 5000

@samre12
Copy link
Owner

samre12 commented Mar 29, 2018

I believe that you are using code from the main branch in which the model contains earlier implementation of both the batch normalization and the dropout layers. There was an issue with the earlier implementation which I had then corrected in the dev branch. Recently I have merged the dev and master branches.
Please use the recent code from the main branch and re-run the training process.
If you wish to remove dropout from the model, set keep_prob for each layer to 1.0 in the configuration file.

@samre12 samre12 added the bug label Mar 29, 2018
@philipshurpik
Copy link
Author

Thanks, I just checked out latest master branch, and run the code with default config.
The problem still exists - after one day of learning it learns nothing :(

 47%|██████▌       | 11724993/25000000 [22:58:13<26:00:25, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000079, avg_q: 0.027890, avg_ep_r: -0.0001, max_ep_r: 0.2953, min_ep_r: -0.2274, # game: 5000
 47%|██████▌       | 11749993/25000000 [23:01:09<25:57:28, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000077, avg_q: 0.027767, avg_ep_r: -0.0002, max_ep_r: 0.2545, min_ep_r: -0.2279, # game: 5000
 47%|██████▌       | 11774985/25000000 [23:04:05<25:54:32, 141.79it/s]INFO:deep_trading_agent:avg_r: -0.0000, avg_l: 0.000088, avg_q: 0.026840, avg_ep_r: -0.0001, max_ep_r: 0.3711, min_ep_r: -0.6593, # game: 5000
 47%|██████▌       | 11799997/25000000 [23:07:01<25:51:34, 141.79it/s]INFO:deep_trading_agent:avg_r: 0.0000, avg_l: 0.000079, avg_q: 0.026633, avg_ep_r: 0.0000, max_ep_r: 0.2962, min_ep_r: -0.2784, # game: 5000

@samre12
Copy link
Owner

samre12 commented Apr 5, 2018

Hi @philipshurpik , can you please share the tensorboard log file if you have generated one, that will help me analyse the training process. Actually I am also fixing this issue right now.
It's probably related to the capacity of the model. You could also try to run the model without dropout by setting keep_prob to 1 for all the layers.

@joseph-zhong
Copy link

I haven't taken a deep look at the training scheme, but intuitively, something seems wrong when I graph the following Loss and Q-Value

Loss minimization seems good, which perhaps implies that the networks are learning, but the fact that the Q-Value reward is also becoming minimized seems alarming:

image

@samre12
Copy link
Owner

samre12 commented May 8, 2018

@joseph-zhong I have been making corrections in the code, the latest changes are reflected in the dev branch where I have integrated this repo with my gym-cryptotrading environment for ease of use.
With these bugs removals, I am also targeting to remove this specific issue but currently the btc_sim.average.q curve that I am getting is quite different from what you have shown (this curve might result from an earlier commit) but still converges to 0 towards the end of training.
Any help upon this matter is greatly appreciated.

@samre12 samre12 self-assigned this May 21, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants