I'm having a bit of trouble understanding the semantics of broadcast_in_dim. #108

linuxlonelyeagle · 2024-02-19T14:54:48Z

linuxlonelyeagle
Feb 19, 2024

The description of the problem is actually quite simple, I didn't understand the semantics of broadcast_in_dim. I've looked at the relevant examples, but still can't get my head around it. I'm a bit lost.

// %operand: [
//            [1, 2, 3]
//           ]
%result = "stablehlo.broadcast_in_dim"(%operand) {
  broadcast_dimensions = array<i64: 2, 1>
} : (tensor<1x3xi32>) -> tensor<2x3x2xi32>
// %result: [
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ],
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ]
//          ]

In this example above, what I am puzzled about is how it is possible to make 2 dimensions into 3 dimensions, which is actually ok, what I can't understand the most is broadcast_dimensions, I've been thinking about it for a long time and I don't have a clue. I hope someone can help me, thanks！

sdasgup3 · 2024-02-20T03:05:55Z

sdasgup3
Feb 20, 2024
Collaborator

The broadcast dimension maps from the axis of the operand `%operand: tensor<1x3xi32>` to the axis of the result `%result: tensor<2x3x2xi32>`. For example, the dimension size 3 at axis 1 of the operand is mapped to the axis broadcast_dimensions[1] (==1) of the result. Similarly, the dimension size 1 at axis 0 of the operand is mapped to the axis broadcast_dimensions[0] (==2) of the result. However, we can see the dimension size at axis 2 of result is 2. This is a special case of broadcasting called degenerative broadcasting, where the dimension size, 1 at axis 0, of the operand will be broadcasted to dimension size 2 at axis 2 of the result. ( https://openxla.org/xla/broadcasting#broadcasting_similar-rank_arrays_with_degenerate_dimensions ). Also there could be axes of result which are not covered via the broadcast_dimensions mapping (e.g. axis 0 of result) which will be considered as the extra dimensions added. Let me know if this helps.

…

On Mon, Feb 19, 2024 at 2:55 PM lonely eagle ***@***.***> wrote: The description of the problem is actually quite simple, I didn't understand the semantics of broadcast_in_dim. I've looked at the relevant examples, but still can't get my head around it. I'm a bit lost. // %operand: [ // [1, 2, 3] // ] %result = "stablehlo.broadcast_in_dim"(%operand) { broadcast_dimensions = array<i64: 2, 1> } : (tensor<1x3xi32>) -> tensor<2x3x2xi32> // %result: [ // [ // [1, 1], // [2, 2], // [3, 3] // ], // [ // [1, 1], // [2, 2], // [3, 3] // ] // ] In this example above, what I am puzzled about is how it is possible to make 2 dimensions into 3 dimensions, which is actually ok, what I can't understand the most is broadcast_dimensions, I've been thinking about it for a long time and I don't have a clue. I hope someone can help me, thanks！ — Reply to this email directly, view it on GitHub <#108>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABISCFL62BCLVM2ES6UXYPTYUNRUNAVCNFSM6AAAAABDPTLJ7GVHI2DSMVQWIX3LMV43ERDJONRXK43TNFXW4OZWGIZTSOJWGY> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

4 replies

linuxlonelyeagle Feb 20, 2024
Author

I'm sorry, could you be more specific? I'm kind of having trouble understanding.

What is the specific meaning of map here?the dimension size 3 at axis 1 of the operand is mapped to the axis broadcast_dimensions[1] (==1) of the result. At first I thought map's was the number 3, but then I couldn't make sense of the words that followed.

Here's another question, is any size of result ok?
I think it would be better if you could exhaustively explain why the value in result is the way it is. Forgive me for my poor understanding.Thanks!

ghpvnist Feb 20, 2024
Collaborator

There are two kinds of broadcast happening here, "in dimension" broadcast and "degenerate" broadcast (don't worry too much about the wording of the two broadcasts). For "in dimension" broadcast, a new dimension is created and data duplicated. For "degenerate" broadcast, no new dimension is created, but a dimension must be 1, and it is expanded to any size which must conform to the constraints of the spec. This is where the broadcast_dimensions comes in.

The following needs to be calculated for each axis in operand (0 and 1)
Axis 0:
The shape(operand)[0] = 1 must be either 1 (degenerate) or match shape(result)[broadcast_dimensions[0]] = 2. In this case, it is 1, and thus degenerate broadcast happening in axis 0 of result.

Axis 1:
The shape(operand)[1] = 3 must be either 1 (degenerate) or match shape(result)[broadcast_dimensions[1]] = 3. In this case, it is 3 and nothing happens here.

Note that axis 1 and axis 2 of result is now covered. Axis 0 is not mentioned, and this is where "in dim" broadcast happens where new dimension is created and data is duplicated.

I kind of worked backwards, so feel free to read the examples in reverse order.

Ex 1 (degenerate broadcast and in dim broadcast):

// operand = [
//            [1, 2, 3],
//            [4, 5, 6]
//           ]
%result = "stablehlo.broadcast_in_dim"(%operand) {
  broadcast_dimensions = array<i64: 0, 1>
} : (tensor<2x3xi64>) -> tensor<2x3x2xi64>
// result = [
//           [
//            [1, 1], [2, 2], [3, 3]
//           ],
//           [
//            [4, 4], [5, 5], [6, 6]
//           ]
//          ]

Here, operand axis 0 is 2 and matches result operand axis 0 (nothing happens). Operand axis 1 is 3 and matches result operand axis 1 (nothing happens). The axis 2 of result was not specified, so an "in dim" broadcast happens here, which is essentially duplicating it (Hopefully the above format helps with visualization).

Ex 2 (degenerate broadcast):

// operand = [
//            [
//             [1], [2], [3]
//            ],
//            [
//             [4], [5], [6]
//            ]
//           ]
%result = "stablehlo.broadcast_in_dim"(%operand) {
  broadcast_dimensions = array<i64: 0, 1, 2>
} : (tensor<2x3x1xi64>) -> tensor<2x3x2xi64>
// result = [
//           [
//            [1, 1], [2, 2], [3, 3]
//           ],
//           [
//            [4, 4], [5, 5], [6, 6]
//           ]
//          ]

Here, operand axis 0 and 1 matches result axis 0 and 1. For axis 2, operand size is 1 whereas result size is 2. This is a degenerate broadcast. No new dimension is added, but data is duplicated.

Ex 3 (in dim broadcast):

// operand = [
//            [1, 2, 3],
//            [4, 5, 6]
//           ]
%result = "stablehlo.broadcast_in_dim"(%operand) {
  broadcast_dimensions = array<i64: 2, 1>
} : (tensor<2x3xi64>) -> tensor<2x3x2xi64>
// result = [
//           [
//            [1, 4],
//            [2, 5],
//            [3, 6]
//           ],
//           [
//            [1, 4],
//            [2, 5],
//            [3, 6]
//           ]
//          ]

Here, you may see why the "map" term was used. The operand's axis 0 which has size 2 "maps" to (perhaps easy to think as transpose. See below example) result axis 2 which is the mapped size 2 (Similarly with axis 1). Here, the dimension 0 was not specified, and this is where the "in dim" broadcast is happening; we have essentially duplicated the operand along the axis 0.

Ex 4 (transpose no broadcast happening):

// operand = [
//            [1, 2, 3],
//            [4, 5, 6]
//           ]
%result = "stablehlo.broadcast_in_dim"(%operand) {
  broadcast_dimensions = array<i64: 1, 0>
} : (tensor<2x3xi64>) -> tensor<3x2xi64>
// result = [
//           [1, 4],
//           [2, 5],
//           [3, 6]
//          ],

Here, operand's axis 0 has size 2 which matches result dim at axis 1. Operand's axis 1 has size 3 which matches result dim at axis 0. From this example, we can see how Ex 3 is created.

You can always refer to our reference interpreter (follows the spec 1 to 1) and run it yourself with examples to get a better understanding for any ops you have difficulty understanding.

These may not be the exhaustive examples, but hopefully helps better understand the semantics. Let me know if you still have questions!

linuxlonelyeagle Feb 21, 2024
Author

Thank you, I have understood.

amrinfathima-mcw Aug 28, 2024

Hi, I am finding it difficult to understand this part related to the broadcast_in_dim op. It would be great if you could help me out.
Is the output_shape calculated based on the broadcast_dims and operand_shape or should the output_shape be known before the operation?
In the link, openxla/stablehlo@ed727e2#r991642842,
we found that when the operand_rank!=result_rank, the shape vector for the example below would be (1,3,1) but after this, how does the result shape get calculated to be 2x3x2?

// %operand: [
//            [1, 2, 3]
//           ]
%result = "stablehlo.broadcast_in_dim"(%operand) {
  broadcast_dimensions = array<i64: 2, 1>
} : (tensor<1x3xi32>) -> tensor<2x3x2xi32>
// %result: [
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ],
//            [
//             [1, 1],
//             [2, 2],
//             [3, 3]
//            ]
//          ]

sdasgup3 · 2024-08-28T16:15:39Z

sdasgup3
Aug 28, 2024
Collaborator

Is the output_shape calculated based on the broadcast_dims and operand_shape or should the output_shape be known before the operation?

The ouput_shape cannot be fully inferred using operand shape and broadcast_dims and yes it should be known beforehand.

Overall, the semantics of the operation is that the operand shape is broadcast to the output shape. broadcast_dimensionsmaps the dimensions of operand to the dimensions of the output shape, i.e. the i'th dimension of the operand is mapped to the broadcast_dimension[i]'th dimension of the output shape with the constraint that the dimensions of operand must have size 1 or be the same size as the dimension in the output shape they are mapped to. The remaining dimensions are filled with dimensions of size 1.

In the example, the operand shape is [1,3]. Using the mapping broadcast_dimensions == [2,1], we got the generally broadcasted shape to be [1,3,1]. After that point the degenerate-dimension broadcasting then broadcasts along these degenerate dimensions, the dimensions with size 1, to reach the output shape.

Please let me know if this helps.

1 reply

amrinfathima-mcw Aug 29, 2024

I got it, thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenXLA

I'm having a bit of trouble understanding the semantics of broadcast_in_dim. #108

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

OpenXLA

I'm having a bit of trouble understanding the semantics of broadcast_in_dim. #108

linuxlonelyeagle Feb 19, 2024

Replies: 2 comments · 5 replies

sdasgup3 Feb 20, 2024 Collaborator

linuxlonelyeagle Feb 20, 2024 Author

ghpvnist Feb 20, 2024 Collaborator

linuxlonelyeagle Feb 21, 2024 Author

amrinfathima-mcw Aug 28, 2024

sdasgup3 Aug 28, 2024 Collaborator

amrinfathima-mcw Aug 29, 2024

linuxlonelyeagle
Feb 19, 2024

Replies: 2 comments 5 replies

sdasgup3
Feb 20, 2024
Collaborator

linuxlonelyeagle Feb 20, 2024
Author

ghpvnist Feb 20, 2024
Collaborator

linuxlonelyeagle Feb 21, 2024
Author

sdasgup3
Aug 28, 2024
Collaborator