Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access L2 Segment Allocation in Herd with Python Example #668

Merged
merged 6 commits into from
Jul 17, 2024

Conversation

hunhoffe
Copy link
Contributor

Previously, it was not possible to send an allocation made in a segment to a herd. Now it is possible to do so. I add a programming example that exercises this capability.

This addresses issue #666

@@ -48,7 +71,7 @@ def __init__(
launch_operands=operands,
sym_name=name,
)
operand_types = [s.type for s in sizes] * 2 + [o.type for o in operands]
operand_types = [s.type for s in sizes] * 2 + get_region_operand_types(operands)
Copy link
Contributor Author

@hunhoffe hunhoffe Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think considering not just Values but also Operations makes sense in the segment and herd since one could feasibly create the operation in the outer region. That is what these changes do.

However, I also applied these changes to the launch and I'm not sure if this makes sense. Is it true that a launch argument should only ever be a value and not an operation? If so, I should revert this section of the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jgmelber Do you have an opinion?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in the future there could be other operations before or after launch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I will keep my changes then!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm missing some subtlety about the python bindings but isn't the result of an Operation always one or more Values? Can you share a code snippet where this comes up? This isn't an issue in ordinary MLIR code or using C++ APIs.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is the context: #666

@@ -23,14 +23,18 @@ def build_module():
ChannelOp("ChanOut")
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I attempted to change this file to use the new ability to send L2 allocations made in a segment to a herd. However, this example still does not work. I'm unsure of how I've rewritten it is reasonable or not, but I've made a note in the associated issue (#654)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tensor_in_l2 seems to be deallocated before it was used in air.herd, if the code was interpreted line by line. Maybe move the deallocation to after the herd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed a bug (I accidentally was using L1 for both allocation types) and made the change you suggested, as well as simplifying the example a bit. The example is still not successful, although the error changed:

Running: builtin.module(air-to-aie{emit-while-loop=false row-offset=2 col-offset=0 device=npu1_4col})
Segmentation fault (core dumped)

@hunhoffe hunhoffe mentioned this pull request Jul 16, 2024
11 tasks
@hunhoffe hunhoffe changed the title Access L2 Segment Allocation in Herd with Example Access L2 Segment Allocation in Herd with Python Example Jul 16, 2024
@hunhoffe hunhoffe requested a review from erwei-xilinx July 16, 2024 16:51
@hunhoffe hunhoffe marked this pull request as ready for review July 16, 2024 16:51
@@ -47,51 +51,43 @@ def launch_body(a, b):
@segment(name="seg")
def segment_body():

tensor_in_l2 = AllocOp(image_type_l2, [], [])
ChannelGet("ChanIn", tensor_in_l2)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After tensor_in_l2 is filled with data from getting ChanIn, I would expect a put to some L2->L1 channel?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to do L2->L1 in the herd due to the purpose of this example (see my other comment).

If this not a valid operation?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is valid operation. I didn't understand the intend the design but I get it now.


ChannelGet("ChanIn", tensor_in)
ChannelPut("ToSelf", tensor_in_l2)
Copy link
Collaborator

@erwei-xilinx erwei-xilinx Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I'd suggest moving this channel put to outside herd.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This example is called worker to self because the purpose of the example is to demonstrate (if) it is possible to put data into a channel and then fetch it from the same channel from within the same worker/compute core in a herd. So to keep this example, we need both ChannelPut("ToSelf", some_mem) and ChannelGet("ToSelf", some_other_mem) within the herd.

The only reason I modified this example in this PR is because previously this example was blocked because I was unable to access L2 memory in the herd, and that capability is needed because channels must move data between different memory regions (in this case, L2 and L1).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. This scenario is not exercised before.

@hunhoffe hunhoffe merged commit fbdcad3 into main Jul 17, 2024
11 checks passed
@hunhoffe hunhoffe deleted the alloc-check-example branch July 17, 2024 01:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants