-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Access L2 Segment Allocation in Herd with Python Example #668
Conversation
@@ -48,7 +71,7 @@ def __init__( | |||
launch_operands=operands, | |||
sym_name=name, | |||
) | |||
operand_types = [s.type for s in sizes] * 2 + [o.type for o in operands] | |||
operand_types = [s.type for s in sizes] * 2 + get_region_operand_types(operands) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think considering not just Values but also Operations makes sense in the segment and herd since one could feasibly create the operation in the outer region. That is what these changes do.
However, I also applied these changes to the launch and I'm not sure if this makes sense. Is it true that a launch argument should only ever be a value and not an operation? If so, I should revert this section of the changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jgmelber Do you have an opinion?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think in the future there could be other operations before or after launch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will keep my changes then!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm missing some subtlety about the python bindings but isn't the result of an Operation always one or more Values? Can you share a code snippet where this comes up? This isn't an issue in ordinary MLIR code or using C++ APIs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here is the context: #666
@@ -23,14 +23,18 @@ def build_module(): | |||
ChannelOp("ChanOut") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I attempted to change this file to use the new ability to send L2 allocations made in a segment to a herd. However, this example still does not work. I'm unsure of how I've rewritten it is reasonable or not, but I've made a note in the associated issue (#654)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tensor_in_l2
seems to be deallocated before it was used in air.herd
, if the code was interpreted line by line. Maybe move the deallocation to after the herd
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed a bug (I accidentally was using L1 for both allocation types) and made the change you suggested, as well as simplifying the example a bit. The example is still not successful, although the error changed:
Running: builtin.module(air-to-aie{emit-while-loop=false row-offset=2 col-offset=0 device=npu1_4col})
Segmentation fault (core dumped)
@@ -47,51 +51,43 @@ def launch_body(a, b): | |||
@segment(name="seg") | |||
def segment_body(): | |||
|
|||
tensor_in_l2 = AllocOp(image_type_l2, [], []) | |||
ChannelGet("ChanIn", tensor_in_l2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After tensor_in_l2
is filled with data from getting ChanIn
, I would expect a put to some L2->L1 channel?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to do L2->L1 in the herd due to the purpose of this example (see my other comment).
If this not a valid operation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is valid operation. I didn't understand the intend the design but I get it now.
|
||
ChannelGet("ChanIn", tensor_in) | ||
ChannelPut("ToSelf", tensor_in_l2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. I'd suggest moving this channel put to outside herd.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is called worker to self because the purpose of the example is to demonstrate (if) it is possible to put data into a channel and then fetch it from the same channel from within the same worker/compute core in a herd. So to keep this example, we need both ChannelPut("ToSelf", some_mem)
and ChannelGet("ToSelf", some_other_mem)
within the herd.
The only reason I modified this example in this PR is because previously this example was blocked because I was unable to access L2 memory in the herd, and that capability is needed because channels must move data between different memory regions (in this case, L2 and L1).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. This scenario is not exercised before.
Previously, it was not possible to send an allocation made in a segment to a herd. Now it is possible to do so. I add a programming example that exercises this capability.
This addresses issue #666