Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in tf_ping_pong_policyGradient.py #1

Open
iNomaD opened this issue Nov 1, 2017 · 2 comments
Open

Error in tf_ping_pong_policyGradient.py #1

iNomaD opened this issue Nov 1, 2017 · 2 comments

Comments

@iNomaD
Copy link

iNomaD commented Nov 1, 2017

Hello @AbhishekAshokDubey!
I am playing with your modification of Policy Gradient algorithm. First, everything works well, but after some amount of episodes something goes wrong.

updating weights of the network.
resetting env. episode171 reward -15.000000.
resetting env. episode172 reward -16.000000.
resetting env. episode173 reward -11.000000.
resetting env. episode174 reward -9.000000.
resetting env. episode175 reward -10.000000.
resetting env. episode176 reward -11.000000.
resetting env. episode177 reward -16.000000.
resetting env. episode178 reward -15.000000.
resetting env. episode179 reward -13.000000.
resetting env. episode180 reward -15.000000.
steps_per_second: 236. running mean: -13.100000. total time: 3433.
updating weights of the network.
C:/Projects/Atari/tf_pong_pg.py:120: RuntimeWarning: invalid value encountered in greater
  action = 2 if np.random.uniform() < aprob else 3 # roll the dice!
resetting env. episode181 reward -21.000000.
resetting env. episode182 reward -21.000000.
resetting env. episode183 reward -21.000000.
resetting env. episode184 reward -21.000000.
resetting env. episode185 reward -21.000000.
resetting env. episode186 reward -21.000000.
resetting env. episode187 reward -21.000000.
resetting env. episode188 reward -21.000000.
resetting env. episode189 reward -21.000000.
resetting env. episode190 reward -21.000000.
steps_per_second: 247. running mean: -21.000000. total time: 3474.

Seems that aprob, h = sess.run([nn.output_layer, nn.hidden_layer], feed_dict={nn.input_layer: np.reshape(x,(1,x.shape[0]))}) begins to return the wrong result after the certain updating of weights.
Do you have an idea about the problem and how to fix it?

@AbhishekAshokDubey
Copy link
Owner

Hey @iNomaD ,
Sorry to hear that, I could not reproduce the same even after 300 episodes, can you check again and see if you get the same error, or help me reproduce it once ?

@iNomaD
Copy link
Author

iNomaD commented Nov 6, 2017

The issue occurs on both my local laptop (Win7) and Google Cloud (Ubuntu) with Python 3.5 and tensorflow 1.3.0. Indeed, it takes more than 3000 episodes, after that sess.run([nn.output_layer, nn.hidden_layer], feed_dict={nn.input_layer: np.reshape(x,(1,x.shape[0]))}) comes to return NaN. First, I decided that the problem is in tf.log function taking 0 as input and tried simple clipping clipped_output = tf.clip_by_value(self.output_layer, 1e-37, 1e+37), but it didn't make sense. I am newbie in ML and tensorflow.

Can you check this model? It has has been trained for ~3000 episodes and is going to crash after several (random amount) batches of training.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants