Tensorlink is a Python library and decentralized compute platform for running PyTorch and Hugging Face models across peer-to-peer networks. It enables you to run, train, and serve large models securely across distributed hardware without relying on centralized cloud inference providers.
- Native PyTorch & REST API Access โ Use models directly in Python or via HTTP endpoints
- Run Large Models Without Local VRAM โ Execute models that exceed your GPU capacity
- Remote Access to Your Own Hardware โ Securely host and access models on your devices via API
- Plug-and-Play Distributed Execution โ Automatic model sharding across multiple GPUs
- Training & Inference Support โ Train models with distributed optimizers or run inference across the network
- Streaming Generation โ Token-by-token streaming for real-time responses
- Privacy Controls โ Route queries exclusively to your own hardware for private usage
- Earn Rewards for Idle Compute โ Contribute GPU resources to the network and get compensated
Early Access: Tensorlink is under active development. APIs and internals may evolve.
Join our Discord for updates, support, and roadmap discussions. Learn more in the Litepaper
pip install tensorlinkRequirements: Python 3.10+, PyTorch 2.3+, UNIX/MacOS (Windows: use WSL)
Execute Hugging Face models distributed across the network.
from tensorlink.ml import DistributedModel
MODEL_NAME = "Qwen/Qwen3-8B"
# Connect to a pre-trained model on the network
model = DistributedModel(
model=MODEL_NAME,
training=True,
device="cuda"
)
optimizer = model.create_optimizer(lr=0.001)See Examples for streaming generation, distributed training, custom models, and network configurations.
Access models via HTTP on the public network, or configure your own hardware for private API access. Tensorlink exposes OpenAI-style endpoints for distributed inference:
import requests
response = requests.post(
"http://smartnodes.ddns.net/tensorlink-api/v1/generate",
json={
"hf_name": "Qwen/Qwen2.5-7B-Instruct",
"message": "Explain quantum computing in one sentence.",
"max_new_tokens": 50,
"stream": False,
}
)
print(response.json())Access the public network or configure your own hardware for private API access. See Examples for streaming, chat completions, and API reference.
Run Tensorlink nodes to host models, shard workloads across GPUs, and expose them via Python and HTTP APIs. Nodes can act as workers (run models), validators (route requests + expose API), or both. This allows you to build private clusters, public compute providers, or local development environments.
- Download the latest
tensorlink-nodefrom Releases - Edit
config.jsonto configure your nodes. - Run:
./run-node.sh
By default, the config is set for running a public worker node. Your GPU will process network workloads and earn rewards via the networking layer (Smartnodes). See Examples for different device and network configurations.
Your config.json controls networking, rewards, and model execution behavior. By default, the config.json is set for
running a public worker node.
| Field | Type | Description |
|---|---|---|
type |
str |
Node Type (worker|validator|both): validator accepts job & api requests, workers run models |
mode |
str |
Network Type (public|private|local): public (earn rewards), private (your devices), local (testing) |
endpoint |
bool |
Endpoint Toggle: Enables REST API server on this node (validator role) |
endpoint_url |
str |
Endpoint URL: Address the API binds to. Use 0.0.0.0 to expose on LAN |
endpoint_port |
int |
Endpoint Port: Port for the HTTP API (default: 64747) |
priority_nodes |
List[List[str, int]] |
Nodes to Connect: Bootstrap trusted peers to connect to first (e.g.,[["192.168.2.42", 38751]]) |
logging |
int |
Console logging mode (e.g., DEBUG|INFO|WARNING) |
| Field | Type | Description |
|---|---|---|
trusted |
bool |
Allows execution of custom user-supplied models |
max_vram_gb |
int |
Limits VRAM usage per node to prevent overload |
| Field | Type | Description |
|---|---|---|
address |
str |
Wallet address used for identity and rewards |
mining |
bool |
Contribute GPU compute to the public network for rewards |
mining_script |
str |
Path to mining / GPU workload executable |
seed_validators |
List[List[str, int, str]] |
Path to mining / GPU workload executable |
For common configuration recipes and examples, see Examples: Node Configuration
Tensorlink exposes OpenAI-compatible HTTP endpoints for distributed inference.
POST /v1/generateโ Simple text generationPOST /v1/chat/completionsโ OpenAI-compatible chat interfacePOST /request-modelโ Preload models across the network
Simple generation endpoint with flexible output formats.
| Parameter | Type | Default | Description |
|---|---|---|---|
hf_name |
string | required | Hugging Face model identifier |
message |
string | required | Input text to generate from |
prompt |
string | null |
Alternative to message |
model_type |
string | "auto" |
Model architecture hint |
max_length |
int | 2048 |
Maximum total sequence length |
max_new_tokens |
int | 2048 |
Maximum tokens to generate |
temperature |
float | 0.7 |
Sampling temperature (0.01-2.0) |
do_sample |
bool | true |
Enable sampling vs greedy decode |
num_beams |
int | 1 |
Beam search width |
stream |
bool | false |
Enable streaming responses |
input_format |
string | "raw" |
"chat" or "raw" |
output_format |
string | "simple" |
"simple", "openai", or "raw" |
history |
array | null |
Chat history for multi-turn conversations |
is_chat_completion |
bool | false |
Determines whether to format chat output |
should_filter = request.is_chat_completion
should_filter = ( request.is_chat_completion or (request.input_format == "chat" and request.output_format == "openai") )
import requests
r = requests.post(
"http://localhost:64747/v1/generate",
json={
"hf_name": "Qwen/Qwen2.5-7B-Instruct",
"message": "Explain quantum computing in one sentence.",
"max_new_tokens": 64,
"temperature": 0.7,
"stream": False,
}
)
print(r.json()["generated_text"])r = requests.post(
"http://localhost:64747/v1/generate",
json={
"hf_name": "Qwen/Qwen2.5-7B-Instruct",
"message": "What about entanglement?",
"input_format": "chat",
"output_format": "openai",
"history": [
{"role": "user", "content": "Explain quantum computing."},
{"role": "assistant", "content": "Quantum computing uses..."}
],
"max_new_tokens": 128,
}
)
print(r.json())OpenAI-compatible chat completions endpoint with full streaming support.
| Parameter | Type | Default | Description |
|---|---|---|---|
model |
string | required | Hugging Face model identifier |
messages |
array | required | Array of chat messages |
temperature |
float | 0.7 |
Sampling temperature (0.01-2.0) |
top_p |
float | 1.0 |
Nucleus sampling threshold |
n |
int | 1 |
Number of completions to generate |
stream |
bool | false |
Enable SSE streaming |
stop |
string/array | null |
Stop sequences |
max_tokens |
int | 1024 |
Maximum tokens to generate |
presence_penalty |
float | 0.0 |
Presence penalty (-2.0 to 2.0) |
frequency_penalty |
float | 0.0 |
Frequency penalty (-2.0 to 2.0) |
user |
string | null |
User identifier for tracking |
{
"role": "system" | "user" | "assistant",
"content": "message text"
}import requests
r = requests.post(
"http://localhost:64747/v1/chat/completions",
json={
"model": "Qwen/Qwen2.5-7B-Instruct",
"messages": [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
"max_tokens": 128,
"temperature": 0.7,
}
)
response = r.json()
print(response["choices"][0]["message"]["content"])import requests
r = requests.post(
"http://localhost:64747/v1/chat/completions",
json={
"model": "Qwen/Qwen2.5-7B-Instruct",
"messages": [
{"role": "system", "content": "You are helpful."},
{"role": "user", "content": "Explain quantum computing."}
],
"max_tokens": 128,
"stream": True
},
stream=True,
)
for line in r.iter_lines():
if line:
if line.decode().startswith("data: "):
data = line.decode()[6:] # Remove "data: " prefix
if data != "[DONE]":
import json
chunk = json.loads(data)
if chunk["choices"][0]["delta"].get("content"):
print(chunk["choices"][0]["delta"]["content"], end="", flush=True){
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1234567890,
"model": "Qwen/Qwen2.5-7B-Instruct",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Quantum computing harnesses quantum mechanics..."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 20,
"completion_tokens": 50,
"total_tokens": 70
}
}data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}
...
data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]
Preload a model across the distributed network before making generation requests.
| Parameter | Type | Description |
|---|---|---|
hf_name |
string | Hugging Face model identifier |
import requests
r = requests.post(
"http://localhost:64747/request-model",
json={"hf_name": "Qwen/Qwen2.5-7B-Instruct"}
)
print(r.json())
# {"status": "success", "message": "Model loading initiated"}Tensorlink is designed to support any Hugging Face model, however errors with certain models may appear. Please report any bugs via Issues
- Temperature: Values below
0.01automatically disable sampling to prevent numerical instability - Streaming: Both endpoints support Server-Sent Events (SSE) streaming via
stream: true - Token IDs: Automatically handles missing pad/eos tokens with safe fallbacks
- Format Control: Use
input_format="chat"andoutput_format="openai"for seamless integration
For complete examples, error handling, and advanced usage, see Examples: HTTP API
- ๐ Documentation โ Full API reference and guides
- ๐ฏ Examples โ Comprehensive usage patterns and recipes
- ๐ฌ Discord Community โ Get help and connect with developers
- ๐ฎ Live Demo โ Try localhostGPT powered by Tensorlink
- ๐ Litepaper โ Technical overview and architecture
We welcome contributions! Here's how to get involved:
- Report bugs via GitHub Issues
- Suggest features on our Discord
- Submit PRs to improve code or documentation
- Support the project via Buy Me a Coffee
Tensorlink is released under the MIT License.
