Intended behavior of `grad(fxn)` when `fxn` is subdifferentiable but not differentiable? #6206

maxentile · 2021-03-24T17:10:44Z

maxentile
Mar 24, 2021

Hi! First, I want to say that I've greatly benefited from being able to use jax over the last few years. Thank you very much for developing and providing such powerful tools!

I have a question about intended behavior in certain corner cases. In cases where the "subdifferential" of a function contains "more than one right answer," what is permissible behavior for Jax?

I've noticed a small inconsistency in Jax's behavior for cases like this:

from jax import numpy as np
from jax import grad

x, y = 0.0, 0.0

# jax breaks "ties" for grad(abs)(0.0)
def f(x, y):
    return np.abs(x - y)

# but not for grad(norm)(something_with_zero_norm)
def g(x, y):
    return np.linalg.norm(x - y)

# (+1, -1), (nan, nan)
grad(f, argnums=(0,1))(x, y), grad(g, argnums=(0,1))(x, y)

In the first case, it appears permissible to return an arbitrary point in the subdifferential, and in the second case it appears impermissible.

I imagine the intended behavior might be to return nan in both cases, whenever the result would be ambiguous.

Alternately, it might be intended for Jax to return an arbitrary value in interval [-1, +1] for the first case, and an arbitrary value in unit ball for the second case, but I don't know how safe that is.

Answered by jakevdp

Mar 24, 2021

I believe this comes from the fact that norm(x) involves a square root, which has an infinite gradient at x=0:

grad(np.sqrt)(0.0)
# DeviceArray(inf, dtype=float32)

I believe this is the only sensible derivative of sqrt for this value, so NaN is the only sensible derivative of norm at zero as implemented.

On the other hand, the gradient of abs is currently defined as 1 at x=0, which is sensible in the sense that it is the correct right-limit, although the left limit is -1. It may also be sensible to return NaN for the gradient of abs at zero, but I'm not certain what implications that might have in other places where abs is used.

In any case, the "correct" answer here is ambiguous, and if …

View full answer

jakevdp · 2021-03-24T17:42:43Z

jakevdp
Mar 24, 2021
Maintainer

I believe this comes from the fact that norm(x) involves a square root, which has an infinite gradient at x=0:

grad(np.sqrt)(0.0)
# DeviceArray(inf, dtype=float32)

I believe this is the only sensible derivative of sqrt for this value, so NaN is the only sensible derivative of norm at zero as implemented.

On the other hand, the gradient of abs is currently defined as 1 at x=0, which is sensible in the sense that it is the correct right-limit, although the left limit is -1. It may also be sensible to return NaN for the gradient of abs at zero, but I'm not certain what implications that might have in other places where abs is used.

In any case, the "correct" answer here is ambiguous, and if your code depends on a particular convention, the best option would be to define a custom gradient for the function in question; see https://jax.readthedocs.io/en/latest/notebooks/Custom_derivative_rules_for_Python_code.html.

3 replies

maxentile Mar 24, 2021
Author

Ahh, thank you! This clarifies. The abs vs. sqrt distinction is also reflected consistently in behavior of norm with ord=1 or 2.

import jax.numpy as np
from jax import grad
from jax.numpy.linalg import norm

from functools import partial

l1 = partial(norm, ord=1)
l2 = partial(norm, ord=2)

zeros = np.zeros(3)

# ones, nans
grad(l1)(zeros), grad(l2)(zeros)

jakevdp Mar 24, 2021
Maintainer

Yep! That said, it would be possible to short-circuit jnp.linalg.abs(x) to simply return abs(x) when the size of x is 1; that would allow the gradient to be defined at all values. I'm not sure what downstream implications that may have, though.

maxentile Mar 24, 2021
Author

Thank you @jakevdp !

I don't have a good sense of when these ambiguities make any difference in practice, or what the downstream implications of changing conventions would be.

As far as I can tell, these sorts of ambiguities seem to arise with things like

Derivatives of norms, for zero-norm inputs. In this case the subdifferential would contain a whole norm ball, rather than a single unique vector. In Jax, this ambiguity is resolved for ord=0 norms by returning the "center" of ball, for ord=1 norm by returning the "rightmost corner" of ball, for ord=+inf, -inf norms by returning a "rightward point" inside the ball, and for ord=-1, 2, -2 norms by returning nans.
Derivatives of eigenvectors of matrices. There is some informative discussion captured in support jax.numpy.linalg.eig eigenvector derivatives #2748 and subsequent PR about the symmetries involved. In this case, a NotImplementedError is raised if user attempts to access derivatives of eigenvectors returned by np.linalg.eig.
Derivatives of programs containing input-dependent control flow, for inputs that are on the "boundaries" of the control flow conditions. (E.g. evaluating grad(f)(zeros) where def f(x): return np.sum(np.where(x < 0, -x, +x)).) In ambiguous cases, the derivative is computed as if the program only contained the executed code path.

I don't know if it's possible (or even desirable!) to "sanitize" a Jax program so that if any ambiguities like this are encountered, they are guaranteed to be flagged somehow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Intended behavior of `grad(fxn)` when `fxn` is subdifferentiable but not differentiable? #6206

{{title}}

Replies: 1 comment 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Intended behavior of grad(fxn) when fxn is subdifferentiable but not differentiable? #6206

maxentile Mar 24, 2021

Replies: 1 comment · 3 replies

jakevdp Mar 24, 2021 Maintainer

maxentile Mar 24, 2021 Author

jakevdp Mar 24, 2021 Maintainer

maxentile Mar 24, 2021 Author

Intended behavior of `grad(fxn)` when `fxn` is subdifferentiable but not differentiable? #6206

maxentile
Mar 24, 2021

Replies: 1 comment 3 replies

jakevdp
Mar 24, 2021
Maintainer

maxentile Mar 24, 2021
Author

jakevdp Mar 24, 2021
Maintainer

maxentile Mar 24, 2021
Author