Skip to content

Intended behavior of grad(fxn) when fxn is subdifferentiable but not differentiable? #6206

Answered by jakevdp
maxentile asked this question in Q&A
Discussion options

You must be logged in to vote

I believe this comes from the fact that norm(x) involves a square root, which has an infinite gradient at x=0:

grad(np.sqrt)(0.0)
# DeviceArray(inf, dtype=float32)

I believe this is the only sensible derivative of sqrt for this value, so NaN is the only sensible derivative of norm at zero as implemented.

On the other hand, the gradient of abs is currently defined as 1 at x=0, which is sensible in the sense that it is the correct right-limit, although the left limit is -1. It may also be sensible to return NaN for the gradient of abs at zero, but I'm not certain what implications that might have in other places where abs is used.

In any case, the "correct" answer here is ambiguous, and if …

Replies: 1 comment 3 replies

Comment options

You must be logged in to vote
3 replies
@maxentile
Comment options

@jakevdp
Comment options

@maxentile
Comment options

Answer selected by maxentile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants