rpathai7-netizen

rpathai7-netizen

Pinned Loading

dnsts-cascade dnsts-cascade Public

4-tier asynchronous LLM cascade system achieving 120 tokens/sec on constrained hardware using llama.cpp, speculative decoding, and GPU+CPU parallel inference

Python