Error in tf_ping_pong_policyGradient.py #1

iNomaD · 2017-11-01T22:30:01Z

Hello @AbhishekAshokDubey!
I am playing with your modification of Policy Gradient algorithm. First, everything works well, but after some amount of episodes something goes wrong.

updating weights of the network.
resetting env. episode171 reward -15.000000.
resetting env. episode172 reward -16.000000.
resetting env. episode173 reward -11.000000.
resetting env. episode174 reward -9.000000.
resetting env. episode175 reward -10.000000.
resetting env. episode176 reward -11.000000.
resetting env. episode177 reward -16.000000.
resetting env. episode178 reward -15.000000.
resetting env. episode179 reward -13.000000.
resetting env. episode180 reward -15.000000.
steps_per_second: 236. running mean: -13.100000. total time: 3433.
updating weights of the network.
C:/Projects/Atari/tf_pong_pg.py:120: RuntimeWarning: invalid value encountered in greater
  action = 2 if np.random.uniform() < aprob else 3 # roll the dice!
resetting env. episode181 reward -21.000000.
resetting env. episode182 reward -21.000000.
resetting env. episode183 reward -21.000000.
resetting env. episode184 reward -21.000000.
resetting env. episode185 reward -21.000000.
resetting env. episode186 reward -21.000000.
resetting env. episode187 reward -21.000000.
resetting env. episode188 reward -21.000000.
resetting env. episode189 reward -21.000000.
resetting env. episode190 reward -21.000000.
steps_per_second: 247. running mean: -21.000000. total time: 3474.

Seems that aprob, h = sess.run([nn.output_layer, nn.hidden_layer], feed_dict={nn.input_layer: np.reshape(x,(1,x.shape[0]))}) begins to return the wrong result after the certain updating of weights.
Do you have an idea about the problem and how to fix it?

The text was updated successfully, but these errors were encountered:

AbhishekAshokDubey · 2017-11-06T04:15:02Z

Hey @iNomaD ,
Sorry to hear that, I could not reproduce the same even after 300 episodes, can you check again and see if you get the same error, or help me reproduce it once ?

iNomaD · 2017-11-06T12:06:02Z

The issue occurs on both my local laptop (Win7) and Google Cloud (Ubuntu) with Python 3.5 and tensorflow 1.3.0. Indeed, it takes more than 3000 episodes, after that sess.run([nn.output_layer, nn.hidden_layer], feed_dict={nn.input_layer: np.reshape(x,(1,x.shape[0]))}) comes to return NaN. First, I decided that the problem is in tf.log function taking 0 as input and tried simple clipping clipped_output = tf.clip_by_value(self.output_layer, 1e-37, 1e+37), but it didn't make sense. I am newbie in ML and tensorflow.

Can you check this model? It has has been trained for ~3000 episodes and is going to crash after several (random amount) batches of training.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error in tf_ping_pong_policyGradient.py #1

Error in tf_ping_pong_policyGradient.py #1

iNomaD commented Nov 1, 2017 •

edited

Loading

AbhishekAshokDubey commented Nov 6, 2017

iNomaD commented Nov 6, 2017

Error in tf_ping_pong_policyGradient.py #1

Error in tf_ping_pong_policyGradient.py #1

Comments

iNomaD commented Nov 1, 2017 • edited Loading

AbhishekAshokDubey commented Nov 6, 2017

iNomaD commented Nov 6, 2017

iNomaD commented Nov 1, 2017 •

edited

Loading