Skip to content
/ llmforge Public

One API, every AI model, instant switching. Change from GPT-4 to Gemini to local models with a single config update. LLMForge is the lightweight, TypeScript-first solution for multi-provider AI applications with zero vendor lock-in

License

Notifications You must be signed in to change notification settings

nginH/llmforge

Repository files navigation

LLMForge πŸ”₯

npm npm bundle size NPM GitHub issues GitHub last commit

A unified, pluggable AI runtime to run prompts across OpenAI, Gemini, Groq, and more β€” all with a single line of code.


πŸ“‹ Table of Contents

✨ Features

  • 🌐 Unified Interface: Single API for multiple AI providers (OpenAI, Gemini, Groq, etc.).
  • πŸͺΆ Lightweight: Only 60.3 kB package size for minimal bundle impact.
  • πŸ”„ Intelligent Fallback: Automatic failover between providers to ensure reliability.
  • ⏳ Configurable Retries: Built-in retry mechanisms with customizable delays.
  • 🌊 Streaming Support: Handle responses as they're generated, token by token.
  • πŸ“Š Token Usage Tracking: Detailed usage statistics for cost monitoring.
  • πŸ”’ TypeScript Support: Full type safety and rich IntelliSense.
  • πŸ”§ Flexible Configuration: Global, per-provider, and per-request settings.

πŸš€ Quick Start

Get up and running in seconds. First, install the package and set up your environment variables (see Environment Variables).

import { RunnerClient } from 'llmforge';

// Configure your primary AI provider
const config = {
   llmConfig: {
      apiKey: process.env.OPENAI_API_KEY || '',
      provider: 'openai',
      model: 'gpt-4o-mini',
   },
};

// Create a client and run your prompt
const client = await RunnerClient.create(config);
const response = await client.run([
   {
      role: 'user',
      parts: [{ text: 'Hello! Can you tell me a joke?' }],
   },
]);

console.log(response.output);

πŸ“¦ Installation

npm install llmforge
# or yarn
yarn add llmforge
# or pnpm
pnpm add llmforge

You can also install a specific version: npm install @nginh/llmforge@2.0.0

πŸ”— Supported Providers & Models

LLMForge provides a growing list of integrations with leading AI providers.

Provider Status Key Features
OpenAI βœ… Supported All text models, function calling, JSON mode
Google Gemini βœ… Supported All text models, high context windows
Groq βœ… Supported Blazing-fast inference, streaming support
Ollama 🚧 Coming Soon Run local models for privacy and offline use
Custom 🚧 Coming Soon Connect to any user-defined model endpoint

πŸ€– View OpenAI Model List
const openAITextModelIds = [
   // GPT-4 Series
   'gpt-4o',
   'gpt-4o-2024-05-13',
   'gpt-4o-2024-08-06',
   'gpt-4o-2024-11-20',
   'gpt-4-turbo',
   'gpt-4-turbo-preview',
   'gpt-4-0125-preview',
   'gpt-4-1106-preview',
   'gpt-4',
   'gpt-4-0314',
   'gpt-4-0613',
   'gpt-4-32k',
   'gpt-4-32k-0314',
   'gpt-4-32k-0613',
   'gpt-4-vision-preview',

   // GPT-4.1 Series (Azure)
   'gpt-4.1',
   'gpt-4.1-mini',
   'gpt-4.1-nano',

   // GPT-4.5 Series
   'gpt-4.5-preview',
   'gpt-4.5-preview-2025-02-27',

   // GPT-3.5 Series
   'gpt-3.5-turbo',
   'gpt-3.5-turbo-0301',
   'gpt-3.5-turbo-0613',
   'gpt-3.5-turbo-1106',
   'gpt-3.5-turbo-0125',
   'gpt-3.5-turbo-16k',
   'gpt-3.5-turbo-16k-0613',
   'gpt-3.5-turbo-instruct',

   // O-Series (Reasoning Models)
   'o4-mini',
   'o3',
   'o3-mini',
   'o3-mini-2025-01-31',
   'o1',
   'o1-mini',
   'o1-preview',
   'o1-mini-2024-09-12',

   // Other Models
   'chatgpt-4o-latest',
   'gpt-4o-mini',
   'gpt-4o-mini-2024-07-18',
   'codex-mini',

   // Deprecated/Legacy
   'davinci-002',
   'babbage-002',
];

✨ View Google Gemini Model List
const geminiModelList = [
   'gemini-2.5-pro',
   'gemini-2.5-pro-preview-05-06',
   'gemini-2.5-flash',
   'gemini-2.5-flash-preview-04-17',
   'gemini-2.5-flash-lite-preview-06-17',
   'gemini-2.0-flash',
   'gemini-2.0-flash-lite',
   'gemma-3n-e4b-it',
   'gemma-3-1b-it',
   'gemma-3-4b-it',
   'gemma-3-12b-it',
   'gemma-3-27b-it',
   'learnlm-2.0-flash-experimental',
];

⚑ View Groq Model List
const groqModelsList = [
   'allam-2-7b',
   'compound-beta',
   'compound-beta-mini',
   'deepseek-r1-distill-llama-70b',
   'distil-whisper',
   'gemma-2-instruct',
   'llama-3-1-8b',
   'llama-3-3-70b',
   'llama-3-70b',
   'llama-3-8b',
   'llama-4-maverick-17b-128e',
   'llama-4-scout-17b-16e',
   'llama-guard-4-12b',
   'llama-prompt-guard-2-22m',
   'prompt-guard-2-86m',
   'mistral-saba-24b',
   'playai-tts',
   'playai-tts-arabic',
   'qwq-32b',
];

⚑ Streaming Responses

LLMForge supports streaming to receive responses token-by-token, ideal for real-time applications like chatbots. Enable it by setting stream: true in your configuration. When streaming is enabled, client.run() returns an AsyncIterable.

import { RunnerClient } from 'llmforge';

const client = await RunnerClient.create({
   llmConfig: {
      apiKey: process.env.GROQ_API_KEY || '',
      provider: 'groq',
      model: 'llama-3-8b', // Groq is great for streaming!
      stream: true, // Enable streaming
   },
});

const stream = await client.run([{ role: 'user', parts: [{ text: 'Write a short story about a robot who discovers music.' }] }]);

let fullResponse = '';
try {
   // The response is an async iterable stream of chunks
   for await (const chunk of stream) {
      if (chunk.output) {
         process.stdout.write(chunk.output); // Print each token as it arrives
         fullResponse += chunk.output;
      }
   }
} catch (error) {
   console.error('\n\nError during streaming:', error);
}

// After the stream ends, get the final response object with usage stats
const finalResponse = stream.getFinalResponse();
console.log('\n\n--- Streaming Complete ---');
console.log('Full Story:', fullResponse);
console.log('Usage Stats:', finalResponse.usage);

βš™οΈ Configuration

Single Provider

For simple use cases, provide a single llmConfig object.

const config = {
   llmConfig: {
      apiKey: 'your-api-key',
      provider: 'openai', // or 'google', 'groq'
      model: 'gpt-4o-mini',
      stream: false, // Optional, defaults to false
      generationConfig: {
         temperature: 0.7,
         maxOutputTokens: 150,
      },
      retryConfig: {
         maxRetries: 3,
         retryDelay: 1000,
      },
   },
};

Multiple Providers with Fallback

For resilience, provide an array of llmConfig objects sorted by priority. If the provider with priority: 1 fails, LLMForge will automatically try the one with priority: 2, and so on.

const config = {
   llmConfig: [
      {
         apiKey: process.env.OPENAI_API_KEY,
         provider: 'openai',
         model: 'gpt-4o',
         priority: 1, // Primary provider
      },
      {
         apiKey: process.env.GOOGLE_API_KEY,
         provider: 'google',
         model: 'gemini-1.5-pro',
         priority: 2, // Fallback provider
      },
   ],
   enableFallback: true, // Must be true to use the fallback mechanism
};

πŸ“ Message & Response Format

Message Format

LLMForge uses a unified message format for multi-turn conversations. The role can be user, model or system to structure the dialogue history.

const messages = [
   {
      role: 'user', // The user's prompt
      parts: [{ text: 'Tell me about machine learning in 50 words.' }],
   },
   {
      role: 'model', // A previous response from the AI (can also be 'model')
      parts: [{ text: 'Machine learning is a subset of AI that enables computers to learn and make decisions from data without explicit programming.' }],
   },
   {
      role: 'user',
      parts: [{ text: 'Can you give me an example?' }],
   },
];

Response Format (Non-Streaming)

When stream: false (the default), client.run() returns a promise that resolves to a standardized response object.

{
  "resp_id": "unique-response-id",
  "output": "This is the generated text response from the AI.",
  "status": "success",
  "created_at": 1750283611,
  "model": "gpt-4o-mini",
  "usage": {
    "input_tokens": 35,
    "output_tokens": 120,
    "total_tokens": 155
  },
  "fallback": {
    "isUsed": true, // This becomes true if a fallback provider was used
    "reason": "API error from primary provider: 500 Internal Server Error"
  }
}

Response Format (Streaming)

When stream: true, client.run() returns an AsyncIterable. You can use a for await...of loop to iterate over the chunks as they are sent from the server.

There are two types of chunks you will receive:

  1. Delta Chunks: These are sent as the AI generates the response, token by token.
  2. Completed Chunk: This is the final message in the stream, containing the full output and metadata.

Delta Chunk

This chunk represents an incremental part of the generated text.

{
   "type": "delta",
   "token": "a single token or word"
}
  • type: Always 'delta'.
  • token: A string containing the next piece of the generated text.

Completed Chunk

This is the final chunk sent when the stream is finished. It contains the complete response and final metadata, similar to the non-streaming response.

{
   "type": "completed",
   "token": "",
   "completeOutput": "The full, assembled text response.",
   "resp_id": "unique-response-id",
   "status": "success",
   "created_at": 1750283611,
   "model": "qwen/qwen3-32b",
   "usage": {
      "input_tokens": 12,
      "output_tokens": 15,
      "total_tokens": 27
   },
   "fallback": {
      "isUsed": false,
      "reason": null
   }
}
  • type: Always 'completed'.
  • token: An empty string.
  • completeOutput: The full, final generated string.
  • The remaining fields (resp_id, status, model, usage, etc.) are the same as in the non-streaming response.

Example Usage

Here is how you would process a streaming response to build the full output and access the final metadata.

const stream = await client.run('your-prompt', { stream: true });

let fullResponse = '';
let finalResponse = null;

for await (const chunk of stream) {
   if (chunk.type === 'delta') {
      // Append the token to your full response string
      const token = chunk.token;
      fullResponse += token;
      // You can process the token here (e.g., render to UI)
      process.stdout.write(token);
   } else if (chunk.type === 'completed') {
      // The stream is done.
      // 'chunk' now contains the final metadata.
      finalResponse = chunk;
   }
}

// After the loop, you can use the final data
console.log('\n\n--- Stream Complete ---');
console.log('Full assembled response:', fullResponse);
console.log('Final metadata:', finalResponse);
console.log('Total output tokens:', finalResponse.usage.output_tokens);

πŸ’‘ Examples

Basic OpenAI Usage

import { RunnerClient } from 'llmforge';
const client = await RunnerClient.create({
   llmConfig: {
      apiKey: process.env.OPENAI_API_KEY,
      provider: 'openai',
      model: 'gpt-4o-mini',
   },
});
const response = await client.run([{ role: 'user', parts: [{ text: 'Explain quantum computing simply.' }] }]);
console.log(response.output);

Basic Gemini Usage

import { RunnerClient } from 'llmforge';
const client = await RunnerClient.create({
   llmConfig: {
      apiKey: process.env.GOOGLE_API_KEY,
      provider: 'google',
      model: 'gemini-1.5-flash',
   },
});
const response = await client.run([{ role: 'user', parts: [{ text: 'Write a haiku about technology.' }] }]);
console.log(response.output);

Basic Groq Usage

import { RunnerClient } from 'llmforge';
const client = await RunnerClient.create({
   llmConfig: {
      apiKey: process.env.GROQ_API_KEY,
      provider: 'groq',
      model: 'llama-3-8b',
   },
});
const response = await client.run([{ role: 'user', parts: [{ text: 'What is the philosophy of absurdism?' }] }]);
console.log(response.output);

πŸ“š API Reference

Click to expand API Reference

RunnerClient

RunnerClient.create(config)

Creates and initializes a new LLMForge client instance.

  • Parameters:
    • config: The main configuration object.
  • Returns: Promise<RunnerClient>

client.run(messages)

Executes a prompt against the configured LLM provider(s).

  • Parameters:
    • messages: An array of message objects in the unified format.
  • Returns: Promise<Response> for non-streaming calls, or Promise<AsyncIterable<StreamResponse>> for streaming calls.

Configuration Options

llmConfig Object

  • apiKey: Your API key for the provider.
  • provider: The provider to use ('openai', 'google', 'groq').
  • model: The specific model ID to use.
  • stream?: (Optional) Set to true to enable streaming. Defaults to false.
  • priority?: (Optional) A number (1, 2, etc.) to set the order for fallback.
  • generationConfig?: (Optional) Parameters to control the model's output.
  • retryConfig?: (Optional) Settings for automatic retries on failure.

generationConfig

  • temperature: Controls randomness (e.g., 0.7).
  • maxOutputTokens: Maximum tokens in the response.
  • topP: Nucleus sampling parameter.
  • topK: Top-k sampling parameter.

retryConfig

  • maxRetries: Maximum number of retry attempts.
  • retryDelay: Delay between retries in milliseconds.

πŸ”‘ Environment Variables

For security, manage your API keys using environment variables. Create a .env file in your project root:

OPENAI_API_KEY=your-openai-api-key
GOOGLE_API_KEY=your-google-api-key
GROQ_API_KEY=your-groq-api-key

And load them in your application using a library like dotenv.

πŸ—ΊοΈ Roadmap

  • [βœ”οΈ] OpenAI Support
  • [βœ”οΈ] Google Gemini Support
  • [βœ”οΈ] Groq Support
  • [βœ”οΈ] Intelligent Fallback & Retry Logic
  • [βœ”οΈ] Token Usage Tracking
  • [βœ”οΈ] Streaming Responses
  • [βœ”οΈ] Full TypeScript Support
  • Ollama Support for Local Models
  • Custom Model Endpoint Support
  • Anthropic Claude Support
  • Azure OpenAI Support
  • Response Caching
  • Unified Function Calling / Tool Use

🀝 Contributing

We welcome contributions! Please follow the guidelines in our CONTRIBUTING.md file or check out the quick guide below.

Contribution Quick Guide
  1. Fork & Clone: Fork the repository and clone it locally.
  2. Create a Branch: Create a new branch for your feature or bug fix (git checkout -b feature/my-new-feature).
  3. Code: Implement your changes. Add new provider interfaces, handle errors gracefully, and add or update tests.
  4. Format: Run npm run format to ensure your code matches the project's style.
  5. Test: Run npm test to ensure all tests pass.
  6. Create a Pull Request: Push your branch to GitHub and create a Pull Request with a clear title and description. Reference any related issues.

❀️ Support


MIT License - see the LICENSE file for details.

Built with ❀️ for the AI developer community by nginH.

About

One API, every AI model, instant switching. Change from GPT-4 to Gemini to local models with a single config update. LLMForge is the lightweight, TypeScript-first solution for multi-provider AI applications with zero vendor lock-in

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors 2

  •  
  •