Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[LLVMGPU] Enable scf.forall distr. on vectorDistribute Pipeline #19420

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

pashu123
Copy link
Contributor

@pashu123 pashu123 commented Dec 9, 2024

Enables scf.forall distribution on the vector distribute pipeline.

@pashu123 pashu123 force-pushed the scfforallgpu branch 8 times, most recently from f06be62 to 08aa578 Compare January 9, 2025 13:14
@pashu123 pashu123 marked this pull request as ready for review January 9, 2025 13:16
@pashu123 pashu123 requested a review from Max191 January 9, 2025 13:59
@pashu123 pashu123 force-pushed the scfforallgpu branch 2 times, most recently from f4b62d0 to 651af59 Compare January 9, 2025 17:43
@pashu123 pashu123 force-pushed the scfforallgpu branch 7 times, most recently from bc4deee to 44709eb Compare January 10, 2025 19:01
Copy link
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems a bit strange that the vector sizes have changed in vector distribute and some of the read/writes from global to shared memory have disappeared in tile and fuse. Do you know what caused these differences?

Comment on lines -145 to -151
// CHECK-DAG: %[[LHS_RD:.+]] = vector.transfer_read %[[B0]]{{.*}} vector<1xf16>
// CHECK-DAG: vector.transfer_write %[[LHS_RD]]
// Note that to simplify the test we are not showing the mapping of the RHS_RD
// to its buffer as it goes through an scf.if/else control structure
// involving allocas.
// CHECK-DAG: %[[RHS_RD:.+]] = vector.transfer_read {{.*}} vector<1xf16>
// CHECK-DAG: vector.transfer_write %[[RHS_RD]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened to these reads?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These happen if the tensor.extract_slices are pushed up in the block. Buffer optimization is kicking in.

Comment on lines -1153 to -1156
// CHECK-DAG: %[[LHS_RD:.+]] = vector.transfer_read {{.*}} vector<4xf32>
// CHECK-DAG: vector.transfer_write %[[LHS_RD]]
// CHECK-DAG: %[[RHS_RD:.+]] = vector.transfer_read {{.*}} vector<1xf32>
// CHECK-DAG: vector.transfer_write %[[RHS_RD]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here

Comment on lines +1155 to +1156
// CHECK-SAME: -> (vector<1x1x1xf32>, vector<1x1x1xf32>, vector<1x2x1x4x1x4xf32>)
// CHECK-COUNT-24: amdgpu.mfma {{.*}} {blocks = 1 : i32, k = 8 : i32, m = 32 : i32, n = 32 : i32} blgp = none : vector<4xf16>, vector<4xf16>, vector<16xf32>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has the number of mfmas changed here?

Copy link
Contributor Author

@pashu123 pashu123 Jan 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The distribution now happens on tile sizes of any length (previously, it was limited to 3). That's why there is a change in count.

Copy link
Contributor

@Max191 Max191 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, there are 3 separate things happening here. Can you split out the 3 commit into separate PRs (The workgroup reordering, slice optimization, and forall distribution enablement)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants