defilantech/llmkube
defilantech/llmkube: Kubernetes operator for self-hosted LLM inference across a heterogeneous GPU fleet: NVIDIA CUDA, AMD Vulkan, and Apple Silicon Metal. Runtimes: llama.cpp, vLLM, TGI, mlx-server. Multi-GPU sharding, model caching, OpenAI- License: apache-2.
- License
- apache-2.0
- Scan status
- pending
- Hosting status
- external
- Upstream
- defilantech/LLMKube
Open interactive artifact page