You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello @AbhishekAshokDubey!
I am playing with your modification of Policy Gradient algorithm. First, everything works well, but after some amount of episodes something goes wrong.
Seems that aprob, h = sess.run([nn.output_layer, nn.hidden_layer], feed_dict={nn.input_layer: np.reshape(x,(1,x.shape[0]))}) begins to return the wrong result after the certain updating of weights.
Do you have an idea about the problem and how to fix it?
The text was updated successfully, but these errors were encountered:
Hey @iNomaD ,
Sorry to hear that, I could not reproduce the same even after 300 episodes, can you check again and see if you get the same error, or help me reproduce it once ?
The issue occurs on both my local laptop (Win7) and Google Cloud (Ubuntu) with Python 3.5 and tensorflow 1.3.0. Indeed, it takes more than 3000 episodes, after that sess.run([nn.output_layer, nn.hidden_layer], feed_dict={nn.input_layer: np.reshape(x,(1,x.shape[0]))}) comes to return NaN. First, I decided that the problem is in tf.log function taking 0 as input and tried simple clipping clipped_output = tf.clip_by_value(self.output_layer, 1e-37, 1e+37), but it didn't make sense. I am newbie in ML and tensorflow.
Can you check this model? It has has been trained for ~3000 episodes and is going to crash after several (random amount) batches of training.
Hello @AbhishekAshokDubey!
I am playing with your modification of Policy Gradient algorithm. First, everything works well, but after some amount of episodes something goes wrong.
Seems that
aprob, h = sess.run([nn.output_layer, nn.hidden_layer], feed_dict={nn.input_layer: np.reshape(x,(1,x.shape[0]))})
begins to return the wrong result after the certain updating of weights.Do you have an idea about the problem and how to fix it?
The text was updated successfully, but these errors were encountered: