aivrar/multi-turboquant
aivrar/multi-turboquant: Unified KV cache compression for LLM inference — TurboQuant, IsoQuant, PlanarQuant, TriAttention. 10 methods, GPU-validated, multi-GPU planner. Compress KV cache 5-80x to run bigger models, longer context, more agents on License: mit.
- License
- mit
- Scan status
- pending
- Hosting status
- external
- Upstream
- aivrar/multi-turboquant
Open interactive artifact page