fix: broadcast vectors for grad calculation #1535

polvalente · 2024-09-15T05:24:45Z

nx/lib/nx/defn/grad.ex

josevalim · 2024-09-16T10:19:03Z

nx/lib/nx/defn/grad.ex

-    Expr.constant(%T{shape: shape, type: {:f, 32}, names: names}, float, [])
+    case shape do
+      %T{vectorized_axes: [_ | _]} = t ->
+        Expr.tensor(Nx.fill(t, float, type: :f32))


We should probably get rid of the names here too.

I also wonder if should move the check for vectorized_axes to constant. Today if someone passes vectorized_axes, Expr.constant is broken. So maybe we should create a tensor if a vectorized axes is given to tensor?

josevalim · 2024-09-16T10:20:07Z

nx/lib/nx/defn/grad.ex

@@ -338,6 +333,8 @@ defmodule Nx.Defn.Grad do
  @verify_grad Application.compile_env(:nx, :verify_grad, false)

  defp update_grads(op, args, ans, g, _to_grad_ids, grads) do
+    args = revectorize_args(args, ans)


I would prefer to not revectorized everything on every operation. Is there any chance we could do in broadcast only?

[unbroadcast(x, Nx.multiply(g, y), ans), unbroadcast(y, Nx.multiply(g, x), ans)]

Lines like this one make it so that g is vectorized and y is unvectorized but has axes with the same name, so things break there.

josevalim · 2024-09-16T20:08:51Z

nx/lib/nx/defn/expr.ex

@@ -1394,6 +1394,11 @@ defmodule Nx.Defn.Expr do

  ## Constant helpers and related optimizations

+  defp constant(%{vectorized_axes: [_ | _]} = out, number) do
+    out = %{out | names: Enum.map(out.names, fn _ -> nil end)}


I don't think this part should be done here, we should preserve the names. Sorry for the confusion.

josevalim · 2024-09-16T20:15:18Z

nx/lib/nx/defn/grad.ex

@@ -1343,9 +1334,77 @@ defmodule Nx.Defn.Grad do

  ## General helpers

-  defp unbroadcast(%{shape: shape} = x, res, %{shape: shape}), do: {x, res}
+  defp revectorize_args(args, ans) do


Let's only apply this if args has more than one element and there are vectorized axes.

Also please test x * sin(y) where y is vectorized.

josevalim · 2024-09-24T18:12:10Z

nx/lib/nx.ex

@@ -4906,12 +4906,19 @@ defmodule Nx do

  def devectorize(%T{shape: shape, names: names, vectorized_axes: vectorized_axes} = tensor, opts)
      when vectorized_axes != [] do
-    opts = keyword!(opts, keep_names: true)
+    opts = keyword!(opts, keep_names: true, drop_inner_names: false)


josevalim · 2024-09-24T18:12:27Z

nx/lib/nx/defn/expr.ex


      t when is_tuple(t) ->
        context = elem(t, 0).data.context

        tuple(
-          expr(tuple_out(tuple_size(t)), context, :metadata, [Nx.devectorize(expr), metadata]),
+          expr(tuple_out(tuple_size(t)), context, :metadata, [expr, metadata]),


Revert. devectorize with keep_names.

josevalim · 2024-09-24T18:12:51Z

nx/lib/nx/defn/expr.ex

+    axes =
+      Keyword.values(vectorized_axes) ++ Tuple.to_list(shape)
+
+    brackets = Enum.map(axes, &[?[, Integer.to_string(&1), ?]])


Should we revert? 🤔

josevalim · 2024-09-24T18:14:20Z

nx/lib/nx/defn/grad.ex

+      %{} ->
+        parent_vectorized_axes = compute_arg_vectorized_axes(t, vectorized_axes)
+
+        nodes = Map.put(nodes, id, {Nx.devectorize(t, keep_names: true), parent_vectorized_axes})


We should not have anything vectorized here.

josevalim · 2024-09-24T18:15:50Z

nx/lib/nx/defn/grad.ex

-        recur_parents_tree(arg, {parents, nodes})
+
+        recur_parents_tree(
+          Nx.devectorize(arg, keep_names: true),


We should not need this either.

josevalim · 2024-09-24T18:15:59Z

nx/lib/nx/defn/grad.ex

+        {parents, nodes}
+
+      %{} ->
+        parent_vectorized_axes = compute_arg_vectorized_axes(t, vectorized_axes)


This may not be necessary.

I've changed the compute_arg_vectorized_axes function into a function that returns only the names (it's less assertive, but a tad cheaper), and that allows us to not need this remapping here.

It was needed for cases where t has [x: 1] and vectorized_axes has [x: 2], for instance -- implicit broadcast situations.

josevalim · 2024-09-24T18:17:38Z

nx/lib/nx/defn/grad.ex

+  end
+
+  defp revectorize_node(node, vectorized_axes) do
+    vectorized_axes = compute_arg_vectorized_axes(node, vectorized_axes)


Maybe we could already read the computed values from nodes. Maybe.

josevalim · 2024-09-24T18:17:46Z

nx/lib/nx/defn/grad.ex

+    vectorized_axes = compute_arg_vectorized_axes(node, vectorized_axes)
+
+    node
+    |> Nx.devectorize(keep_names: false)


They should all be devectorized.

josevalim · 2024-09-24T18:18:01Z

nx/lib/nx/defn/grad.ex

@@ -1343,9 +1424,34 @@ defmodule Nx.Defn.Grad do

  ## General helpers

-  defp unbroadcast(%{shape: shape} = x, res, %{shape: shape}), do: {x, res}
+  defp unbroadcast(x, res, ans) do


Revert these changes.

nx/lib/nx/defn/grad.ex

josevalim

Beautiful!!!!! We can merge this and add the new ops later!

Co-authored-by: José Valim <jose.valim@dashbit.co>

fix: broadcast vectors for grad calculation

394a12d

polvalente self-assigned this Sep 15, 2024

polvalente commented Sep 15, 2024

View reviewed changes

nx/lib/nx/defn/grad.ex Outdated Show resolved Hide resolved

polvalente added 7 commits September 15, 2024 02:45

fix attempt

414726b

test: make core tests pass

a08d0fd

fix: inspect vectorized axes as usual

2f7c5f1

chore: revert some changes

d87ffa1

chore: remove commented code

7fbdffd

chore: remove stray comments

db0b6f0

chore: remove more stray comments

20cc168

polvalente requested a review from josevalim September 16, 2024 09:46

josevalim reviewed Sep 16, 2024

View reviewed changes

refactor: support vectorized constant

22b9a24

josevalim reviewed Sep 16, 2024

View reviewed changes

josevalim approved these changes Sep 16, 2024

View reviewed changes

polvalente added 6 commits September 16, 2024 17:21

test: add x * sin(y) grad test

8f60a71

feat: revectorize args only when strictly necessary

a026c0e

fix: correctness of revectorize_args over possible kw_args functions

37408b3

refactor: simpler revectorization of nodes

380c330

refactor: revectorize in place

1487b17

fix: propagate vectorized axes throughout the recursion'

a832956

josevalim reviewed Sep 24, 2024

View reviewed changes

polvalente added 5 commits September 24, 2024 15:27

chore: revert some code due to code review

24b9ea5

chore: revert unbroadcast

d0f93c9

chore: remove some devectorization occurences

b89111d

chore: simplify vectorized axes calculation

9075ef0

chore: remove another superfluous devectorize

fcc4e10

polvalente requested a review from josevalim September 24, 2024 18:53

Merge branch 'main' into pv-fix/vectorized-grad

affdc90

josevalim reviewed Sep 24, 2024

View reviewed changes

nx/lib/nx/defn/grad.ex Outdated Show resolved Hide resolved

josevalim approved these changes Sep 24, 2024

View reviewed changes

polvalente and others added 2 commits September 24, 2024 17:41

Update nx/lib/nx/defn/grad.ex

add0134

Co-authored-by: José Valim <jose.valim@dashbit.co>

chore: format

8d94bc0

polvalente merged commit 762d3ee into main Sep 24, 2024
7 of 8 checks passed

polvalente deleted the pv-fix/vectorized-grad branch September 24, 2024 20:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: broadcast vectors for grad calculation #1535

fix: broadcast vectors for grad calculation #1535

polvalente commented Sep 15, 2024

josevalim Sep 16, 2024

polvalente Sep 16, 2024

josevalim Sep 16, 2024

polvalente Sep 16, 2024

josevalim Sep 16, 2024

josevalim Sep 16, 2024

josevalim Sep 24, 2024

josevalim Sep 24, 2024

josevalim Sep 24, 2024

josevalim Sep 24, 2024

josevalim Sep 24, 2024

josevalim Sep 24, 2024

polvalente Sep 24, 2024

josevalim Sep 24, 2024

josevalim Sep 24, 2024

josevalim Sep 24, 2024

josevalim left a comment

fix: broadcast vectors for grad calculation #1535

fix: broadcast vectors for grad calculation #1535

Conversation

polvalente commented Sep 15, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

josevalim left a comment

Choose a reason for hiding this comment