defilantech/llmkube

defilantech/llmkube: Kubernetes operator for self-hosted LLM inference across a heterogeneous GPU fleet: NVIDIA CUDA, AMD Vulkan, and Apple Silicon Metal. Runtimes: llama.cpp, vLLM, TGI, mlx-server. Multi-GPU sharding, model caching, OpenAI- License: apache-2.

License: apache-2.0
Scan status: pending
Hosting status: external
Upstream: defilantech/LLMKube

Open interactive artifact page