aivrar/multi-turboquant

aivrar/multi-turboquant: Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cache 5-80x to run bigger models, longer context, more agents on License: mit.

License: mit
Scan status: pending
Hosting status: external
Upstream: aivrar/multi-turboquant

Open interactive artifact page