vLLM HuggingFace Bridge for GitHub Copilot

Bridge GitHub Copilot with local vLLM/TGI servers and HuggingFace cloud models

🚀 Quick Install

Option 1: Download Latest Release (Recommended)

⬇️ Download v1.0.0 VSIX (92KB)
Install in VS Code:
- Press Ctrl+Shift+P (or Cmd+Shift+P on macOS)
- Type Extensions: Install from VSIX...
- Select the downloaded .vsix file
Restart VS Code and select models in GitHub Copilot Chat

Option 2: Command Line Install

# Download and install in one command
wget https://github.com/dzivkovi/vllm-huggingface-bridge/releases/download/v1.0.0/vllm-huggingface-bridge-1.0.0.vsix
code --install-extension vllm-huggingface-bridge-1.0.0.vsix

Uninstall Previous Versions

# Remove original HuggingFace extension if installed
code --uninstall-extension HuggingFace.huggingface-vscode-chat

# Remove old vLLM Community version if installed
code --uninstall-extension vllm-community.vllm-huggingface-bridge

✨ Features

🔒 Air-Gapped Ready: Complete offline operation with local vLLM/TGI servers
🚀 Dual Mode: Seamlessly switch between local and cloud models
⚡ Optimized: 92KB package size (91% smaller than original)
🛡️ Enterprise Ready: Production-tested in secure environments
🔧 Zero Config: Works out-of-the-box with sensible defaults
📊 Smart Token Management: Automatic allocation for small context models

Local Deployment (Air-Gapped Environments)

For secure, on-premise environments where data cannot leave your network:

Start your local vLLM or TGI server (see setup instructions below)
Configure VS Code settings: "huggingface.localEndpoint": "http://your-server:8000"
Select your local model from the GitHub Copilot Chat model picker
No API keys required, all processing stays on your infrastructure

Cloud Deployment (Hugging Face)

Install the extension: 📦 Download VSIX
Open VS Code's chat interface.
Click the model picker and click "Manage Models...".
Select "Hugging Face" provider.
Provide your Hugging Face Token, you can get one in your settings page. You only need to give it the inference.serverless permissions.
Choose the models you want to add to the model picker. 🥳

Local vLLM/TGI Server Setup

Production Ready: Successfully deployed in enterprise air-gapped environments.

Benefits of Local Inference:

Data Security: All data remains on your infrastructure
Air-Gapped Operation: No internet connectivity required
Low Latency: Direct connection to local GPU servers
Cost Control: No per-token API charges
Compliance: Meet strict data residency requirements

vLLM Setup (Recommended):

# Start vLLM (tested with RTX 4060, 8GB VRAM, DeepSeek-Coder 6.7B)
docker run -d --name vllm-server \
  --gpus all \
  --shm-size=4g \
  --ipc=host \
  -p 8000:8000 \
  vllm/vllm-openai:latest \
  --model TheBloke/deepseek-coder-6.7B-instruct-AWQ \
  --quantization awq \
  --gpu-memory-utilization 0.85

Configure VS Code:

// .vscode/settings.json
{
  "huggingface.localEndpoint": "http://localhost:8000",

  // CRITICAL for small context models (2048 tokens):
  "github.copilot.chat.editor.temporalContext.enabled": false,
  "github.copilot.chat.edits.temporalContext.enabled": false,
  "github.copilot.chat.edits.suggestRelatedFilesFromGitHistory": false
}

⚠️ Token Limit Considerations:

2048 context models ARE usable with the settings above
vLLM adds ~500 tokens for chat template formatting
Extension automatically adjusts token allocation
Responses limited to 50-100 tokens when near limits
For best experience: Use 8K+ context models

Why Use This Extension

Access SoTA open-source LLMs with tool calling capabilities.
Single API to switch between multiple providers: Cerebras, Cohere, Fireworks AI, Groq, HF Inference, Hyperbolic, Nebius, Novita, Nscale, SambaNova, Together AI, and more. See the full list of partners in the Inference Providers docs.
Built for high availability (across providers) and low latency.
Local Inference Support: Run vLLM or TGI servers on-premise for air-gapped deployments
Transparent pricing: what the provider charges is what you pay.

💡 The free Hugging Face user tier gives you a small amount of monthly inference credits to experiment. Upgrade to Hugging Face PRO or Enterprise for $2 in monthly credits plus pay-as-you-go access across all providers!

Requirements

VS Code 1.104.0 or higher.
Hugging Face access token with inference.serverless permissions.

🛠️ Development

git clone https://github.com/huggingface/huggingface-vscode-chat
cd huggingface-vscode-chat
npm install
npm run compile

Press F5 to launch an Extension Development Host.

Common scripts:

Build: npm run compile
Watch: npm run watch
Lint: npm run lint
Format: npm run format
Quick rebuild: scripts/rebuild-extension.sh
Test vLLM: scripts/test-vllm.sh

📖 For detailed guides, see our comprehensive documentation

🖥️ Using Local Inference Servers

This extension supports connecting to your own local inference servers for private model hosting.

Recommended: vLLM Server (Docker Desktop)

Quick Start - Copy & Paste This Command:

docker run -d --name vllm-server \
  --gpus all \
  --shm-size=4g \
  --ipc=host \
  -p 8000:8000 \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  vllm/vllm-openai:latest \
  --model TheBloke/deepseek-coder-6.7B-instruct-AWQ \
  --quantization awq \
  --gpu-memory-utilization 0.85 \
  --max-model-len 2048 \
  --max-num-seqs 16 \
  --disable-log-stats

⚠️ IMPORTANT: All flags are REQUIRED!

--shm-size=4g - Without this, vLLM crashes
--ipc=host - Without this, GPU communication fails
--max-model-len 2048 - Without this, runs out of memory

VS Code Configuration:

Open Settings (Ctrl+,)
Search for "huggingface.localEndpoint"
Set value: http://localhost:8000
Reload VS Code

Docker Desktop Management:

Start: Click ▶️ on container in Docker Desktop
Stop: Click ⏹️ on container in Docker Desktop
Logs: Click container name to view logs
Remove: Stop first, then click 🗑️

Verify It's Working:

curl http://localhost:8000/v1/models
# Should return: TheBloke/deepseek-coder-6.7B-instruct-AWQ

Full Setup Guide: vLLM Setup Guide Model Selection: Choose models for your GPU

Legacy: TGI Server

⚠️ Deprecated due to stability issues:

Open VS Code Settings (Ctrl+,)
Search for "huggingface.localEndpoint"
Enter your TGI server URL (e.g., http://192.168.1.100:8080)
See TGI Setup Guide for legacy support

📚 Learn more

Inference Providers documentation: https://huggingface.co/docs/inference-providers/index
VS Code Chat Provider API: https://code.visualstudio.com/api/extension-guides/ai/language-model-chat-provider
TGI Documentation: https://huggingface.co/docs/text-generation-inference

Support & License

Open issues: https://github.com/huggingface/huggingface-vscode-chat/issues

Name		Name	Last commit message	Last commit date
Latest commit History 110 Commits
.claude		.claude
.github		.github
.vscode		.vscode
analysis/0000		analysis/0000
assets		assets
docs		docs
releases		releases
scripts		scripts
src		src
.gitignore		.gitignore
.prettierignore		.prettierignore
.prettierrc		.prettierrc
.vscode-test.js		.vscode-test.js
.vscode-test.mjs		.vscode-test.mjs
.vscodeignore		.vscodeignore
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
eslint.config.mjs		eslint.config.mjs
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

vLLM HuggingFace Bridge for GitHub Copilot

🚀 Quick Install

Option 1: Download Latest Release (Recommended)

Option 2: Command Line Install

Uninstall Previous Versions

✨ Features

Local Deployment (Air-Gapped Environments)

Cloud Deployment (Hugging Face)

Local vLLM/TGI Server Setup

Benefits of Local Inference:

vLLM Setup (Recommended):

Configure VS Code:

⚠️ Token Limit Considerations:

Why Use This Extension

Requirements

🛠️ Development

🖥️ Using Local Inference Servers

Recommended: vLLM Server (Docker Desktop)

Quick Start - Copy & Paste This Command:

VS Code Configuration:

Docker Desktop Management:

Verify It's Working:

Legacy: TGI Server

📚 Learn more

Support & License

About

Uh oh!

Releases 2

Packages

Languages

License

dzivkovi/vllm-huggingface-bridge

Folders and files

Latest commit

History

Repository files navigation

vLLM HuggingFace Bridge for GitHub Copilot

🚀 Quick Install

Option 1: Download Latest Release (Recommended)

Option 2: Command Line Install

Uninstall Previous Versions

✨ Features

Local Deployment (Air-Gapped Environments)

Cloud Deployment (Hugging Face)

Local vLLM/TGI Server Setup

Benefits of Local Inference:

vLLM Setup (Recommended):

Configure VS Code:

⚠️ Token Limit Considerations:

Why Use This Extension

Requirements

🛠️ Development

🖥️ Using Local Inference Servers

Recommended: vLLM Server (Docker Desktop)

Quick Start - Copy & Paste This Command:

VS Code Configuration:

Docker Desktop Management:

Verify It's Working:

Legacy: TGI Server

📚 Learn more

Support & License

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages