Go
Browse 21 Hugging Bay artifacts for Go, including 0 Hugging Face imports and 0 hosted files.
- ollama/ollama Get up and running with Kimi-K2.6, GLM-5.1, MiniMax, DeepSeek, gpt-oss, Qwen, Gemma and other models.
- infiniflow/ragflow RAGFlow is a leading open-source Retrieval-Augmented Generation (RAG) engine that fuses cutting-edge RAG with Agent capabilities to create a superior context layer for LLMs
- milvus-io/milvus Milvus is a high-performance, cloud-native vector database built for scalable vector ANN search
- gitleaks/gitleaks Find secrets with Gitleaks 🔑
- esengine/deepseek-reasonix DeepSeek-native AI coding agent for your terminal. Engineered around prefix-cache stability — leave it running.
- tensorchord/envd 🏕️ Reproducible development environment for humans and agents
- sammcj/gollama Go manage your Ollama models
- paesslerag/gval Expression evaluation in golang
- kelindar/search Go library for embedded vector search and semantic embeddings using llama.cpp
- ome-projects/ome Open Model Engine (OME) — Kubernetes operator for LLM serving, GPU scheduling, and model lifecycle management. Works with SGLang, vLLM, TensorRT-LLM, and Triton
- gpustack/gguf-parser-go Review/Check GGUF files and estimate the memory usage and maximum tokens per second.
- raketenkater/ggrun Auto-tuned launcher for GGUF models on llama.cpp / ik_llama.cpp — OpenAI-compatible server with multi-GPU tensor-split, MoE expert placement, measured flag tuning (AI Tune), hardware-matched HuggingFace downloads, and cr
- symflower/eval-dev-quality DevQualityEval: An evaluation benchmark 📈 and framework to compare and evolve the quality of code generation of LLMs.
- defilantech/llmkube Kubernetes operator for self-hosted LLM inference across a heterogeneous GPU fleet: NVIDIA CUDA, AMD Vulkan, and Apple Silicon Metal. Runtimes: llama.cpp, vLLM, TGI, mlx-server. Multi-GPU sharding, model caching, OpenAI-
- greynewell/infermux Route inference across providers.
- hadihonarvar/flock Self-hosted LLM gateway. One Go binary turns your Macs and Linux boxes into a private inference cluster — multi-machine routing, sharding via llama.cpp-RPC, per-user keys + quotas + audit, OpenAI- and Anthropic-compatibl
Open interactive topics page