Skip to content

Conversation

@hlky
Copy link
Owner

@hlky hlky commented Dec 26, 2025

WIP

  • introduce "buckets" to IntVar
  • step on Shape (for input tensor creation)
  • bucket_step recalculated when needed
  • different offset computed
  • ?????

dynamic workspace!!

Next: codegen

Working in current state

Tested with AutoencoderKL Decode

8, 8 latent 503MiB / 46068MiB

128, 128 latent 1815MiB / 46068MiB

256, 256 latent 5727MiB / 46068MiB

Limitations:

  • requires FAU allocation mode

Next steps:

  • cleanup
    • we assume rank 4 input tensor for bucketing and length 3 bucket key e.g. (0, 0, 0)
    • bucketing use should be toggleable (use existing memory planning if disabled)
  • support lazy allocation with re-use if shape doesn't change between runs, and grow/shrink memory if shape does change
  • ????

Overall goal

Compile for large dynamic input shape, but only use the minimum required workspace for the actual input shape

@hlky hlky force-pushed the dynamic-workspace branch 2 times, most recently from bb1537c to a3298a7 Compare December 27, 2025 16:11
@hlky hlky force-pushed the dynamic-workspace branch from b66e2b8 to dc27a0c Compare January 13, 2026 11:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants