Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inactive Activation gradients #56

Open
Seabrand opened this issue Feb 1, 2021 · 1 comment
Open

Inactive Activation gradients #56

Seabrand opened this issue Feb 1, 2021 · 1 comment

Comments

@Seabrand
Copy link

Seabrand commented Feb 1, 2021

Notably in chapter 8, the backpropagation through activation function gradients appear off: if you target the derivative of an activation function for a given input σ'(x), shouldn't you use that input for the gradient instead of the output y = σ(x)?
Example: if you calculate
layer_1 = relu(np.dot(layer_0,weights_0_1))
in the forward direction, then propagating backward would require
layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(np.dot(layer_0,weights_0_1))
i.e. the input at the activation function, and not as suggested
layer_1_delta = layer_2_delta.dot(weights_1_2.T) * relu2deriv(layer_1)
After all, applying relu2deriv(relu(x)) would yield (x>=0)x>=0, the identity function and actually not change anything.
The effects on training are not too big, but it does impact overfitting, the amount of loss and in fact some of the narrative.

@AshishPandagre
Copy link

I also have the same doubt. Even Andrew Ng did it the same way as we are expecting.
Below are the screenshots I took from Andrew Ng's course.

Screenshot (97)
Screenshot (98)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants