Skip to content

mattjhawken/tensorlink

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Logo

Peer-to-peer AI Inference & Distributed Execution with PyTorch

Latest Release Version Node Downloads GitHub Repo stars Join us on Discord Documentation

Table of Contents

What is Tensorlink?

Tensorlink is a Python library and decentralized compute platform for running PyTorch and Hugging Face models across peer-to-peer networks. It enables you to run, train, and serve large models securely across distributed hardware without relying on centralized cloud inference providers.

Key Features

  • Native PyTorch & REST API Access โ€“ Use models directly in Python or via HTTP endpoints
  • Run Large Models Without Local VRAM โ€“ Execute models that exceed your GPU capacity
  • Remote Access to Your Own Hardware โ€“ Securely host and access models on your devices via API
  • Plug-and-Play Distributed Execution โ€“ Automatic model sharding across multiple GPUs
  • Training & Inference Support โ€“ Train models with distributed optimizers or run inference across the network
  • Streaming Generation โ€“ Token-by-token streaming for real-time responses
  • Privacy Controls โ€“ Route queries exclusively to your own hardware for private usage
  • Earn Rewards for Idle Compute โ€“ Contribute GPU resources to the network and get compensated

Early Access: Tensorlink is under active development. APIs and internals may evolve.
Join our Discord for updates, support, and roadmap discussions. Learn more in the Litepaper

Quick Start

Option 1: Distributed Models in Python

Installation

pip install tensorlink

Requirements: Python 3.10+, PyTorch 2.3+, UNIX/MacOS (Windows: use WSL)

Basic Usage

Execute Hugging Face models distributed across the network.

from tensorlink.ml import DistributedModel

MODEL_NAME = "Qwen/Qwen3-8B"

# Connect to a pre-trained model on the network
model = DistributedModel(
    model=MODEL_NAME,
    training=True,
    device="cuda"
)
optimizer = model.create_optimizer(lr=0.001)

See Examples for streaming generation, distributed training, custom models, and network configurations.

Option 2: Accessing Models via HTTP

Access models via HTTP on the public network, or configure your own hardware for private API access. Tensorlink exposes OpenAI-style endpoints for distributed inference:

import requests

response = requests.post(
    "http://smartnodes.ddns.net/tensorlink-api/v1/generate",
    json={
        "hf_name": "Qwen/Qwen2.5-7B-Instruct",
        "message": "Explain quantum computing in one sentence.",
        "max_new_tokens": 50,
        "stream": False,
    }
)

print(response.json())

Access the public network or configure your own hardware for private API access. See Examples for streaming, chat completions, and API reference.

Option 3: Run a Node

Run Tensorlink nodes to host models, shard workloads across GPUs, and expose them via Python and HTTP APIs. Nodes can act as workers (run models), validators (route requests + expose API), or both. This allows you to build private clusters, public compute providers, or local development environments.

  1. Download the latest tensorlink-node from Releases
  2. Edit config.json to configure your nodes.
  3. Run: ./run-node.sh

By default, the config is set for running a public worker node. Your GPU will process network workloads and earn rewards via the networking layer (Smartnodes). See Examples for different device and network configurations.


Configuration Reference

Your config.json controls networking, rewards, and model execution behavior. By default, the config.json is set for running a public worker node.

Node

Field Type Description
type str Node Type (worker|validator|both): validator accepts job & api requests, workers run models
mode str Network Type (public|private|local): public (earn rewards), private (your devices), local (testing)
endpoint bool Endpoint Toggle: Enables REST API server on this node (validator role)
endpoint_url str Endpoint URL: Address the API binds to. Use 0.0.0.0 to expose on LAN
endpoint_port int Endpoint Port: Port for the HTTP API (default: 64747)
priority_nodes List[List[str, int]] Nodes to Connect: Bootstrap trusted peers to connect to first (e.g.,[["192.168.2.42", 38751]])
logging int Console logging mode (e.g., DEBUG|INFO|WARNING)

ML

Field Type Description
trusted bool Allows execution of custom user-supplied models
max_vram_gb int Limits VRAM usage per node to prevent overload

Crypto

Field Type Description
address str Wallet address used for identity and rewards
mining bool Contribute GPU compute to the public network for rewards
mining_script str Path to mining / GPU workload executable
seed_validators List[List[str, int, str]] Path to mining / GPU workload executable

For common configuration recipes and examples, see Examples: Node Configuration


API Reference

Tensorlink exposes OpenAI-compatible HTTP endpoints for distributed inference.

Endpoints

  • POST /v1/generate โ€“ Simple text generation
  • POST /v1/chat/completions โ€“ OpenAI-compatible chat interface
  • POST /request-model โ€“ Preload models across the network

/v1/generate

Simple generation endpoint with flexible output formats.

Request Parameters

Parameter Type Default Description
hf_name string required Hugging Face model identifier
message string required Input text to generate from
prompt string null Alternative to message
model_type string "auto" Model architecture hint
max_length int 2048 Maximum total sequence length
max_new_tokens int 2048 Maximum tokens to generate
temperature float 0.7 Sampling temperature (0.01-2.0)
do_sample bool true Enable sampling vs greedy decode
num_beams int 1 Beam search width
stream bool false Enable streaming responses
input_format string "raw" "chat" or "raw"
output_format string "simple" "simple", "openai", or "raw"
history array null Chat history for multi-turn conversations
is_chat_completion bool false Determines whether to format chat output

In _generate_streaming:

should_filter = request.is_chat_completion

Or if you want finer control:

should_filter = ( request.is_chat_completion or (request.input_format == "chat" and request.output_format == "openai") )

Example: Basic Generation

import requests

r = requests.post(
    "http://localhost:64747/v1/generate",
    json={
        "hf_name": "Qwen/Qwen2.5-7B-Instruct",
        "message": "Explain quantum computing in one sentence.",
        "max_new_tokens": 64,
        "temperature": 0.7,
        "stream": False,
    }
)

print(r.json()["generated_text"])

Example: Chat Format with History

r = requests.post(
    "http://localhost:64747/v1/generate",
    json={
        "hf_name": "Qwen/Qwen2.5-7B-Instruct",
        "message": "What about entanglement?",
        "input_format": "chat",
        "output_format": "openai",
        "history": [
            {"role": "user", "content": "Explain quantum computing."},
            {"role": "assistant", "content": "Quantum computing uses..."}
        ],
        "max_new_tokens": 128,
    }
)

print(r.json())

/v1/chat/completions

OpenAI-compatible chat completions endpoint with full streaming support.

Request Parameters

Parameter Type Default Description
model string required Hugging Face model identifier
messages array required Array of chat messages
temperature float 0.7 Sampling temperature (0.01-2.0)
top_p float 1.0 Nucleus sampling threshold
n int 1 Number of completions to generate
stream bool false Enable SSE streaming
stop string/array null Stop sequences
max_tokens int 1024 Maximum tokens to generate
presence_penalty float 0.0 Presence penalty (-2.0 to 2.0)
frequency_penalty float 0.0 Frequency penalty (-2.0 to 2.0)
user string null User identifier for tracking

Message Format

{
    "role": "system" | "user" | "assistant",
    "content": "message text"
}

Example: Non-Streaming

import requests

r = requests.post(
    "http://localhost:64747/v1/chat/completions",
    json={
        "model": "Qwen/Qwen2.5-7B-Instruct",
        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": "Explain quantum computing in simple terms."}
        ],
        "max_tokens": 128,
        "temperature": 0.7,
    }
)

response = r.json()
print(response["choices"][0]["message"]["content"])

Example: Streaming

import requests

r = requests.post(
    "http://localhost:64747/v1/chat/completions",
    json={
        "model": "Qwen/Qwen2.5-7B-Instruct",
        "messages": [
            {"role": "system", "content": "You are helpful."},
            {"role": "user", "content": "Explain quantum computing."}
        ],
        "max_tokens": 128,
        "stream": True
    },
    stream=True,
)

for line in r.iter_lines():
    if line:
        if line.decode().startswith("data: "):
            data = line.decode()[6:]  # Remove "data: " prefix
            if data != "[DONE]":
                import json
                chunk = json.loads(data)
                if chunk["choices"][0]["delta"].get("content"):
                    print(chunk["choices"][0]["delta"]["content"], end="", flush=True)

Response Format (Non-Streaming)

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1234567890,
  "model": "Qwen/Qwen2.5-7B-Instruct",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Quantum computing harnesses quantum mechanics..."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 20,
    "completion_tokens": 50,
    "total_tokens": 70
  }
}

Response Format (Streaming)

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{"content":"Quantum"},"finish_reason":null}]}

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{"content":" computing"},"finish_reason":null}]}

...

data: {"id":"chatcmpl-123","object":"chat.completion.chunk","created":1234567890,"model":"Qwen/Qwen2.5-7B-Instruct","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

/request-model

Preload a model across the distributed network before making generation requests.

Request Parameters

Parameter Type Description
hf_name string Hugging Face model identifier

Example

import requests

r = requests.post(
    "http://localhost:64747/request-model",
    json={"hf_name": "Qwen/Qwen2.5-7B-Instruct"}
)

print(r.json())
# {"status": "success", "message": "Model loading initiated"}

Notes

Tensorlink is designed to support any Hugging Face model, however errors with certain models may appear. Please report any bugs via Issues

  • Temperature: Values below 0.01 automatically disable sampling to prevent numerical instability
  • Streaming: Both endpoints support Server-Sent Events (SSE) streaming via stream: true
  • Token IDs: Automatically handles missing pad/eos tokens with safe fallbacks
  • Format Control: Use input_format="chat" and output_format="openai" for seamless integration

For complete examples, error handling, and advanced usage, see Examples: HTTP API


Learn More

  • ๐Ÿ“š Documentation โ€“ Full API reference and guides
  • ๐ŸŽฏ Examples โ€“ Comprehensive usage patterns and recipes
  • ๐Ÿ’ฌ Discord Community โ€“ Get help and connect with developers
  • ๐ŸŽฎ Live Demo โ€“ Try localhostGPT powered by Tensorlink
  • ๐Ÿ“˜ Litepaper โ€“ Technical overview and architecture

Contributing

We welcome contributions! Here's how to get involved:

Tensorlink is released under the MIT License.

About

Distributed model and API infrastructure for PyTorch.

Topics

Resources

License

Contributing

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •