ai-deployment

Here are 19 public repositories matching this topic...

ghostwright / specter

Deploy AI agents to dedicated VMs in 90 seconds. Interactive TUI. Automatic DNS and TLS. You own the infrastructure.

go infrastructure cli devops automation deployment hetzner ai-agents terminal-ui bubbletea vm-provisioning ai-deployment

Updated Apr 5, 2026
Go

muxi-ai / muxi

Star

Deploy intelligence. Open-source infrastructure for AI agents in production.

Updated Apr 1, 2026

Mechanistic analysis of a GPT-2–like model exploring the compositionality gap in transformers. Using Logit Lens and Causal Tracing, the study identifies and overcomes a deep-layer bottleneck via dataset enhancement addressing the stated Compositionality Gap (NeurIPS24).

tokenizer hpc-clusters explainable-ai ai-research explainability compositionality neurips llm logitlens mechanistic-interpretability llm-training causal-tracing gpt-2-model ai-deployment neurips-paper

Updated Oct 29, 2025
Python

jkfran / azure-openai-deployment-mock

Sponsor

Star

A mock Azure OpenAI API for seamless testing and development, supporting both streaming and non-streaming responses. Easily emulate OpenAI completions with token-based streaming in a local or Dockerized environment.

azure openai streaming-api api-testing local-development mock-api fastapi azure-openai-api ai-deployment

Updated Nov 3, 2024
Python

dilettacal / digital-twin

Star

My Digital Twin Companion

aws typescript amazon reactjs openai chatbots pythnon llms ai-deployment

Updated Mar 1, 2026
Python

bossnameless308wp / DeepSeek-V3-Windows-Local-Fix

Star

DeepSeek-V3 Windows Deployment Fixer: Resolves CUDA_ERROR_OUT_OF_MEMORY, missing cublas64_12.dll, and Triton compiler errors. Optimized for RTX 30/40 series GPUs.

python machine-learning pytorch nvidia-gpu windows-fix ollama llm-optimization ai-deployment deepseek-v3 dll-repair deepseek-r1 cuda-fix

Updated Feb 9, 2026

capetron / private-ai-deployment-guide

Star

Complete guide to deploying private, on-premise AI and LLMs: hardware selection, model comparison (ollama vs vLLM vs llama.cpp), security hardening, and AI governance policy templates. By Petronella Technology Group.

machine-learning gpu self-hosted ai-security llm ai-governance llama-cpp vllm ollama private-ai ai-deployment on-premise-ai

Updated Mar 6, 2026
Shell

Fahadfk / AI_deployment

Star

Comprehensive guide to FastAPI, Pydantic, and SQLAlchemy for AI engineers. Learn API design, validation, and ORM workflows with practical examples and setup 🐙

python machine-learning metal hpc openai maas creative rag huggingface ai-agent llm generative-ai chatgpt llm-inference local-ai llama3 ai-deployment heterogeneous-cluster

Updated Apr 6, 2026

geek2geeks / gpu-inference-server

Star

Production-oriented GPU inference stack with FastAPI, Docker, Redis, Prometheus, and Grafana for AI workload serving

docker mlops fastapi gpu-inference ai-deployment

Updated Mar 11, 2026
Python

PurnabrataPanja / Spam_Classifier

Star

A Streamlit-based spam classifier that predicts whether a message is spam or not spam using machine learning.

python data-science machine-learning natural-language-processing text-classification cyber-security spam-detection nlp-model streamlit ai-deployment

Updated Sep 4, 2024
Jupyter Notebook

FlosMume / CUDA-AI-Inference-Starter

Star

A minimal, high-performance starter kit for running AI model inference on NVIDIA GPUs using CUDA. Includes environment setup, sample kernels, and guidance for integrating ONNX/TensorRT pipelines for fast, optimized inference on modern GPU hardware.

benchmarking deep-learning cuda nvidia high-performance-computing fp16 int8 inference-engine onnx tensor-rt devtech model-optimization gpu-inference ai-deployment adge-ai

Updated Nov 2, 2025
Cuda

NathanMaine / ai-deployment-experience-bot

Star

Skeleton Slack bot that lets developers trigger deployments, rollbacks, and status checks via simple chat commands.

python chatbot agentic-ai ai-deployment

Updated Dec 15, 2025
Python

alhemdrew / self-hosted-llm-infrastructure

Star

Deployment of a self-hosted LLM infrastructure using Ollama and Open WebUI on Linux, including custom model creation, API integration, and system-level troubleshooting.

machine-learning llm prompt-engineering api-intergration local-llm ollama openwebui ai-deployment self-hosted-ai

Updated Feb 14, 2026

maihoangbichtram / multi_agentic_rag

Star

Multi-agentic researcher (RAG)

ai deployment terraform google-cloud multiagent google-cloud-platform rag tavily agentic-ai ai-deployment

Updated Mar 21, 2025
Python

RavinduLayanga / mixcap-web-platform

Star

Full-stack web application (React + Flask) for Multimodal Video Captioning. Deploys the MixCap model (BLIP-2 + Wav2Vec2) to generate video descriptions for end-users.

react flask rest-api pytorch full-stack video-captioning ai-deployment mixcap

Updated Jan 21, 2026
JavaScript

AkshaySyal / edge-ai-model-deployment

Star

End-to-end pipeline for deploying deep learning models on edge devices: model conversion, quantization, hardware acceleration, and Android integration.

python computer-vision deep-learning tensorflow pytorch edge-ai model-quantization ai-deployment