Doubts about the direction of the project's evolution

Regarding improving constensor's performance for LLMs, I've been giving it some more thought. We previously discussed optimizing the CPU backend. It seems to me that to achieve the goal you mentioned – "run an LLM at very competitive speeds on any device" – which sounds a lot like what TVM aims for, we might need to consider introducing a more sophisticated compilation architecture, perhaps something akin to TVM's multi-level IR (a high-level graph IR and a low-level operator IR). This could enable more powerful graph optimizations, operator fusion, and facilitate future extensions to more backends.

However, I also notice that `constensor`'s current design might lean more towards a runtime that executes directly, with graph optimizations being triggered implicitly, perhaps without the pre-conceived complexity of a multi-level IR system. Introducing such an architecture would be a significant undertaking.

I'd be really interested to hear your thoughts on the long-term positioning of constensor. Do you envision it evolving into a general-purpose compiler framework like TVM (perhaps with differentiators like its Rust implementation or a focus on JIT capabilities), or is the focus more on it being a lightweight, intelligent runtime optimized for specific scenarios (like efficient LLM inference)?

@EricLBuehler 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Doubts about the direction of the project's evolution #30

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Doubts about the direction of the project's evolution #30

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions