llm
Get up and running with Llama 3.3, DeepSeek-R1, Phi-4, Gemma 2, and other large language models.
User-friendly AI Interface (Supports Ollama, OpenAI API, ...)
DeepSeek LLM: Let there be answers
Universal LLM Deployment Engine with ML Compilation
MiniCPM-o 2.6: A GPT-4o Level MLLM for Vision, Speech and Multimodal Live Streaming on Your Phone
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
llama3 implementation one matrix multiplication at a time
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Pipelines: Versatile, UI-Agnostic OpenAI-Compatible Plugin Framework
Examples and guides for using the Gemini API
A high-throughput and memory-efficient inference and serving engine for LLMs
INT4/INT5/INT8 and FP16 inference on CPU for RWKV language model
Maid is a cross-platform Flutter app for interfacing with GGUF / llama.cpp models locally, and with Ollama and OpenAI models remotely.