#

cpu-inference

Here are 34 public repositories matching this topic...

kennethleungty / Llama-2-Open-Source-LLM-CPU-Inference

Running Llama 2 and other Open-Source LLMs on CPU Inference Locally for Document Q&A

Updated Nov 6, 2023
Python

CoderLSF / fast-llama

Runs LLaMA with Extremely HIGH speed

llama inference-engine cpu-inference llama2

Updated Nov 21, 2023
C++

rbitr / llm.f90

LLM inference in Fortran

ai chatbot transformer llama language-model mamba state-space-model cpu-inference llm llamacpp llama2 phi-2

Updated May 30, 2024
Fortran

homelab

jozsefszalma / homelab

The bare metal in my basement

Updated Dec 4, 2025

lucienhuangfu / eLLM

eLLM Infers LLM on CPUs in Real Time

llama cpu-inference deep-thinking llm-infernece deep-research context-engineering rust-llm

Updated Dec 19, 2025
Rust

yybit / pllm

Portable LLM - A rust library for LLM inference

cpu-inference aigc llm llama2

Updated Apr 13, 2024
Rust

laelhalawani / gguf_llama

Wrapper for simplified use of Llama2 GGUF quantized models.

llama quantization cpu-inference llamacpp llama2 gguf

Updated Jan 14, 2024
Python

JohnClaw / chatllm.v

V-lang api wrapper for llm-inference chatllm.cpp

chatbot inference bindings api-wrapper llama quantization gemma mistral v-lang vlang cpu-inference llm llms chatllm ggml llm-inference qwen phi3

Updated Nov 20, 2024
C

lahcenkh / rag-network-docs

Privacy-focused RAG chatbot for network documentation. Chat with your PDFs locally using Ollama, Chroma & LangChain. CPU-only, fully offline.

ai python3 network-programming cpu-inference vector-database-embedding rag-chatbot

Updated Sep 7, 2025
Python

codito / arey

Simple large language model playground app

cli ai mistral cpu-inference large-language-models llm local-model llamacpp llama2 gguf

Updated Dec 15, 2025
Rust

JohnClaw / chatllm.vb

VB.NET api wrapper for llm-inference chatllm.cpp

bindings api-wrapper llama vb-net vbnet gemma mistral int8 int8-inference int8-quantization cpu-inference chatllm ggml llm-inference qwen

Updated Nov 26, 2024
Visual Basic .NET

JohnClaw / chatllm.cs

C# api wrapper for llm-inference chatllm.cpp

csharp inference bindings api-wrapper llama gemma mistral int8 int8-inference int8-quantization cpu-inference llm llms chatllm ggml llm-inference qwen

Updated Nov 20, 2024
C#

Nishant1998 / PlantAi

PlantAi is a ResNet-based CNN model trained on the PlantVillage dataset to classify plant leaf images as healthy or diseased. This repository includes PyTorch training code, tools to convert the model to TensorFlow Lite (TFLite) for deployment, and an Android app integrating the model for real-time leaf disease detection from camera images.

android java deep-neural-networks computer-vision deep-learning cnn image-classification resnet onnx pytoch cpu-inference tflight real-time-inference agriculture-ai

Updated Aug 21, 2025
Java

JohnClaw / chatllm.nim

Nim api-wrapper for llm-inference chatllm.cpp

Updated Nov 20, 2024
C

BjornMelin / local-llm-workbench

🧠 A comprehensive toolkit for benchmarking, optimizing, and deploying local Large Language Models. Includes performance testing tools, optimized configurations for CPU/GPU/hybrid setups, and detailed guides to maximize LLM performance on your hardware.

cuda gpu-acceleration model-management inference-optimization model-quantization cpu-inference llama-cpp local-llm llm-deployment llm-benchmarking ollama-optimization hybrid-inference wsl-ai-setup context-window-scaling

Updated Mar 27, 2025
Shell

chinese-soup / cbot-telegram-whisper

Simple bot that transcribes Telegram voice messages. Powered by go-telegram-bot-api & whisper.cpp Go bindings.

bot golang speech-recognition openai speech-to-text whisper cpu-inference whisper-cpp whispercpp

Updated Apr 19, 2023
Go

MekayelAnik / vllm-cpu

Wheels & Docker images for running vLLM on CPU-only systems, optimized for different CPU instruction sets

cpu-inference vllm llm-inference vllm-serve vllm-server

Updated Dec 19, 2025
Shell

JohnClaw / chatllm.rs

rust api wrapper for llm-inference chatllm.cpp

rust chatbot inference bindings api-wrapper llama quantization gemma mistral cpu-inference llm llms chatllm ggml llm-inference qwen

Updated Nov 27, 2024
Rust

JohnClaw / chatllm.lua

lua api wrapper for llm-inference chatllm.cpp

lua chatbot luajit inference bindings api-wrapper llama quantization gemma mistral cpu-inference llm llms chatllm ggml llm-inference qwen

Updated Nov 26, 2024
Lua

JohnClaw / chatllm.kt

kotlin api wrapper for llm-inference chatllm.cpp

kotlin chatbot inference bindings api-wrapper llama quantization gemma mistral cpu-inference llm llms chatllm ggml llm-inference qwen

Updated Nov 26, 2024
C

Improve this page

Add a description, image, and links to the cpu-inference topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the cpu-inference topic, visit your repo's landing page and select "manage topics."