Cuda

Browse 3 Hugging Bay artifacts for Cuda, including 0 Hugging Face imports and 0 hosted files.

alibaba/rtp-llm RTP-LLM: Alibaba's high-performance LLM inference engine for diverse applications.
kekzl/imp From-scratch C++/CUDA inference engine for the NVIDIA RTX 5090 (sm_120a) — the best single-GPU backend for agentic AI: tool calling, long-context loops, reasoning and concurrent sub-agents on top of the fastest single-st
gigit0000/qwen3.cu Single-file, pure CUDA C implementation for running inference on Qwen3 0.6B GGUF. No Dependencies.