Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove OverridenKVCache and fix some peculiar cases of prims.copy_ + NVFuser #788

Merged
merged 2 commits into from
Jul 17, 2024

Conversation

nikitaved
Copy link
Contributor

@nikitaved nikitaved commented Jul 17, 2024

As per title, we can remove OverridenKVCache in our litgpt tests since the missing op, index_copy_ is in the system.
Additionally, this PR

@nikitaved nikitaved force-pushed the nikitaved/kvcache_use_index_copy_only branch from c19f2cd to 7fb979d Compare July 17, 2024 09:17
@nikitaved nikitaved force-pushed the nikitaved/kvcache_use_index_copy_only branch from 7fb979d to 593d61c Compare July 17, 2024 09:34
@nikitaved nikitaved changed the title OverridenKVCache: remove index_add and use index_copy instead OverridenKVCache: remove since we have index_copy now. Jul 17, 2024
@nikitaved nikitaved force-pushed the nikitaved/kvcache_use_index_copy_only branch from 593d61c to 2aae6cd Compare July 17, 2024 09:49
@nikitaved
Copy link
Contributor Author

nikitaved commented Jul 17, 2024

Looks like the real issue could have been buried a bit deeper. Related:
#789
#791

@nikitaved
Copy link
Contributor Author

I have a fix that appears to resolve all our issues...

@nikitaved nikitaved force-pushed the nikitaved/kvcache_use_index_copy_only branch from 2aae6cd to ce2e0a1 Compare July 17, 2024 16:19
@nikitaved nikitaved changed the title OverridenKVCache: remove since we have index_copy now. remove OverridenKVCache and fix some peculiar cases of prims.copy_ + NVFuser Jul 17, 2024
@nikitaved nikitaved force-pushed the nikitaved/kvcache_use_index_copy_only branch from ce2e0a1 to c20b9e2 Compare July 17, 2024 16:21
@nikitaved
Copy link
Contributor Author

nikitaved commented Jul 17, 2024

The fix is in! @jjsjann123, @crcrpar, @tfogal , could you please also have a look and tell me whether you approve the fix in the NVFuser fusion logic.

@nikitaved nikitaved force-pushed the nikitaved/kvcache_use_index_copy_only branch from 44d8845 to a4d5767 Compare July 17, 2024 16:38
Copy link
Collaborator

@tfogal tfogal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I may have missed something in the motivation for dropping the test option. Could you educate me on where you're coming from here?

For the nvfuser aspect, in general I think our fusion logic should be revisited. This seems like an extension in the same vein as what's there, and as such seems reasonable. I also think that the higher-level rework is a big can of worms, as we're presently experiencing with the small-scale change of bookends. So I'm inclined to say we should just go with what's here and accrue a little bit more debt pending a larger rewrite.
But it's more reasonable for @jjsjann123 to make that call than me.

thunder/tests/litgpt_model.py Show resolved Hide resolved
# NOTE: filter all first "dangling" no-op copies
while len(bsyms) > 0 and bsyms[0].sym.id is prims.PrimIDs.COPY_:
fused_bsyms.append(bsyms[0])
bsyms = bsyms[1:]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • It seems we're looking at the absolute order of operations, rather than a dataflow order. But maybe we don't have infra to do so via dataflow?
  • Should logic like this go into _should_fuse, above, instead of here? @jjsjann123

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, side note: could I ask you to have the comment define what "dangling" means here?

Copy link
Contributor Author

@nikitaved nikitaved Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a similar idea. Tried the easiest solution first... I suspect the usage of prims.copy_ is very structured, and issues do appear only at fusion boundaries. But I totally agree - I'd rather move it into the data-flow logic...

@tfogal , after some thought, I believe the fix is legit. I have fixed the issue of bsym groups not being topologically sorted (horizontally) when forming bsym groups (in #656). Therefore, because the logic of in-place ops is op + copy, if op is not claimed, the problematic copy (or copies) can only appear at the top of a fusion group. So, there is no need in following the dataflow order, it is being implicitly encoded already through bsyms being topologically sorted both horizontally and vertically (only for the op + copy combo, of course, and their positions in the graph).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's exactly the kind of thing I was worried about :-)

As you're highlighting, we have a lot of implicit assumptions here on the sorting of ops because we are not relying on dataflow. Such a requirement is implicit on earlier code's behavior, and there's really not a strong reason for that earlier behavior to exist (at least before this).

This is a good definition of fragility--when disconnected, distant pieces of code impose specific requirements on one another.

One of the reasons to ping Jie to get his thoughts is that maybe this fragility is fine here: if our thought is that we're going to scrap much of the fusion logic entirely in a rewrite, a little bit of extra debt here probably is not worth worrying about.

Copy link
Contributor Author

@nikitaved nikitaved Jul 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Luckily, it only concerns the usage of prims.copy_, which is, as it appears, is a helper-/not-user-facing-symbol. It does nothing if/when prims.copy_ is removed and if/when we re-implement the in-place logic to be something different. The current logic of fusing based on just parent->child relation is limiting as it does not consider horizontal relationships (same level nodes. But it got fixed in #656). Luckily, if the input program is valid, it is already topologically sorted both horizontally and vertically, so it only makes sense to preserve this structure and exploit it if possible (including fusions, fusion regions, and/or any valid subset of a program). The logic here appears to me like a good and a simple example of such exploitations... But I agree, I should add a comment and "proof" correctness.

@tfogal
Copy link
Collaborator

tfogal commented Jul 17, 2024

+1 from me but I'd prefer we wait for Jie's thoughts before merging.

Copy link
Collaborator

@t-vi t-vi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There goes my chance to merge without review.
Thank you @nikitaved @tfogal

@t-vi t-vi merged commit 7fdf213 into main Jul 17, 2024
36 checks passed
@t-vi t-vi deleted the nikitaved/kvcache_use_index_copy_only branch July 17, 2024 17:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prims.copy_ with NVFuser: sometimes it has to be a kernel. index_copy_: incorrect with CUDA inputs
4 participants