LLMForge 🔥

A unified, pluggable AI runtime to run prompts across OpenAI, Gemini, Groq, and more — all with a single line of code.

📋 Table of Contents

LLMForge 🔥

✨ Features

🌐 Unified Interface: Single API for multiple AI providers (OpenAI, Gemini, Groq, etc.).
🪶 Lightweight: Only 60.3 kB package size for minimal bundle impact.
🔄 Intelligent Fallback: Automatic failover between providers to ensure reliability.
⏳ Configurable Retries: Built-in retry mechanisms with customizable delays.
🌊 Streaming Support: Handle responses as they're generated, token by token.
📊 Token Usage Tracking: Detailed usage statistics for cost monitoring.
🔒 TypeScript Support: Full type safety and rich IntelliSense.
🔧 Flexible Configuration: Global, per-provider, and per-request settings.

🚀 Quick Start

Get up and running in seconds. First, install the package and set up your environment variables (see Environment Variables).

import { RunnerClient } from 'llmforge';

// Configure your primary AI provider
const config = {
   llmConfig: {
      apiKey: process.env.OPENAI_API_KEY || '',
      provider: 'openai',
      model: 'gpt-4o-mini',
   },
};

// Create a client and run your prompt
const client = await RunnerClient.create(config);
const response = await client.run([
   {
      role: 'user',
      parts: [{ text: 'Hello! Can you tell me a joke?' }],
   },
]);

console.log(response.output);

📦 Installation

npm install llmforge
# or yarn
yarn add llmforge
# or pnpm
pnpm add llmforge

You can also install a specific version: npm install @nginh/llmforge@2.0.0

🔗 Supported Providers & Models

LLMForge provides a growing list of integrations with leading AI providers.

Provider	Status	Key Features
OpenAI	✅ Supported	All text models, function calling, JSON mode
Google Gemini	✅ Supported	All text models, high context windows
Groq	✅ Supported	Blazing-fast inference, streaming support
Ollama	🚧 Coming Soon	Run local models for privacy and offline use
Custom	🚧 Coming Soon	Connect to any user-defined model endpoint

🤖 View OpenAI Model List

const openAITextModelIds = [
   // GPT-4 Series
   'gpt-4o',
   'gpt-4o-2024-05-13',
   'gpt-4o-2024-08-06',
   'gpt-4o-2024-11-20',
   'gpt-4-turbo',
   'gpt-4-turbo-preview',
   'gpt-4-0125-preview',
   'gpt-4-1106-preview',
   'gpt-4',
   'gpt-4-0314',
   'gpt-4-0613',
   'gpt-4-32k',
   'gpt-4-32k-0314',
   'gpt-4-32k-0613',
   'gpt-4-vision-preview',

   // GPT-4.1 Series (Azure)
   'gpt-4.1',
   'gpt-4.1-mini',
   'gpt-4.1-nano',

   // GPT-4.5 Series
   'gpt-4.5-preview',
   'gpt-4.5-preview-2025-02-27',

   // GPT-3.5 Series
   'gpt-3.5-turbo',
   'gpt-3.5-turbo-0301',
   'gpt-3.5-turbo-0613',
   'gpt-3.5-turbo-1106',
   'gpt-3.5-turbo-0125',
   'gpt-3.5-turbo-16k',
   'gpt-3.5-turbo-16k-0613',
   'gpt-3.5-turbo-instruct',

   // O-Series (Reasoning Models)
   'o4-mini',
   'o3',
   'o3-mini',
   'o3-mini-2025-01-31',
   'o1',
   'o1-mini',
   'o1-preview',
   'o1-mini-2024-09-12',

   // Other Models
   'chatgpt-4o-latest',
   'gpt-4o-mini',
   'gpt-4o-mini-2024-07-18',
   'codex-mini',

   // Deprecated/Legacy
   'davinci-002',
   'babbage-002',
];

✨ View Google Gemini Model List

const geminiModelList = [
   'gemini-2.5-pro',
   'gemini-2.5-pro-preview-05-06',
   'gemini-2.5-flash',
   'gemini-2.5-flash-preview-04-17',
   'gemini-2.5-flash-lite-preview-06-17',
   'gemini-2.0-flash',
   'gemini-2.0-flash-lite',
   'gemma-3n-e4b-it',
   'gemma-3-1b-it',
   'gemma-3-4b-it',
   'gemma-3-12b-it',
   'gemma-3-27b-it',
   'learnlm-2.0-flash-experimental',
];

⚡ View Groq Model List

const groqModelsList = [
   'allam-2-7b',
   'compound-beta',
   'compound-beta-mini',
   'deepseek-r1-distill-llama-70b',
   'distil-whisper',
   'gemma-2-instruct',
   'llama-3-1-8b',
   'llama-3-3-70b',
   'llama-3-70b',
   'llama-3-8b',
   'llama-4-maverick-17b-128e',
   'llama-4-scout-17b-16e',
   'llama-guard-4-12b',
   'llama-prompt-guard-2-22m',
   'prompt-guard-2-86m',
   'mistral-saba-24b',
   'playai-tts',
   'playai-tts-arabic',
   'qwq-32b',
];

⚡ Streaming Responses

LLMForge supports streaming to receive responses token-by-token, ideal for real-time applications like chatbots. Enable it by setting stream: true in your configuration. When streaming is enabled, client.run() returns an AsyncIterable.

import { RunnerClient } from 'llmforge';

const client = await RunnerClient.create({
   llmConfig: {
      apiKey: process.env.GROQ_API_KEY || '',
      provider: 'groq',
      model: 'llama-3-8b', // Groq is great for streaming!
      stream: true, // Enable streaming
   },
});

const stream = await client.run([{ role: 'user', parts: [{ text: 'Write a short story about a robot who discovers music.' }] }]);

let fullResponse = '';
try {
   // The response is an async iterable stream of chunks
   for await (const chunk of stream) {
      if (chunk.output) {
         process.stdout.write(chunk.output); // Print each token as it arrives
         fullResponse += chunk.output;
      }
   }
} catch (error) {
   console.error('\n\nError during streaming:', error);
}

// After the stream ends, get the final response object with usage stats
const finalResponse = stream.getFinalResponse();
console.log('\n\n--- Streaming Complete ---');
console.log('Full Story:', fullResponse);
console.log('Usage Stats:', finalResponse.usage);

⚙️ Configuration

Single Provider

For simple use cases, provide a single llmConfig object.

const config = {
   llmConfig: {
      apiKey: 'your-api-key',
      provider: 'openai', // or 'google', 'groq'
      model: 'gpt-4o-mini',
      stream: false, // Optional, defaults to false
      generationConfig: {
         temperature: 0.7,
         maxOutputTokens: 150,
      },
      retryConfig: {
         maxRetries: 3,
         retryDelay: 1000,
      },
   },
};

Multiple Providers with Fallback

For resilience, provide an array of llmConfig objects sorted by priority. If the provider with priority: 1 fails, LLMForge will automatically try the one with priority: 2, and so on.

const config = {
   llmConfig: [
      {
         apiKey: process.env.OPENAI_API_KEY,
         provider: 'openai',
         model: 'gpt-4o',
         priority: 1, // Primary provider
      },
      {
         apiKey: process.env.GOOGLE_API_KEY,
         provider: 'google',
         model: 'gemini-1.5-pro',
         priority: 2, // Fallback provider
      },
   ],
   enableFallback: true, // Must be true to use the fallback mechanism
};

📝 Message & Response Format

Message Format

LLMForge uses a unified message format for multi-turn conversations. The role can be user, model or system to structure the dialogue history.

const messages = [
   {
      role: 'user', // The user's prompt
      parts: [{ text: 'Tell me about machine learning in 50 words.' }],
   },
   {
      role: 'model', // A previous response from the AI (can also be 'model')
      parts: [{ text: 'Machine learning is a subset of AI that enables computers to learn and make decisions from data without explicit programming.' }],
   },
   {
      role: 'user',
      parts: [{ text: 'Can you give me an example?' }],
   },
];

Response Format (Non-Streaming)

When stream: false (the default), client.run() returns a promise that resolves to a standardized response object.

{
  "resp_id": "unique-response-id",
  "output": "This is the generated text response from the AI.",
  "status": "success",
  "created_at": 1750283611,
  "model": "gpt-4o-mini",
  "usage": {
    "input_tokens": 35,
    "output_tokens": 120,
    "total_tokens": 155
  },
  "fallback": {
    "isUsed": true, // This becomes true if a fallback provider was used
    "reason": "API error from primary provider: 500 Internal Server Error"
  }
}

Response Format (Streaming)

When stream: true, client.run() returns an AsyncIterable. You can use a for await...of loop to iterate over the chunks as they are sent from the server.

There are two types of chunks you will receive:

Delta Chunks: These are sent as the AI generates the response, token by token.
Completed Chunk: This is the final message in the stream, containing the full output and metadata.

Delta Chunk

This chunk represents an incremental part of the generated text.

{
   "type": "delta",
   "token": "a single token or word"
}

type: Always 'delta'.
token: A string containing the next piece of the generated text.

Completed Chunk

This is the final chunk sent when the stream is finished. It contains the complete response and final metadata, similar to the non-streaming response.

{
   "type": "completed",
   "token": "",
   "completeOutput": "The full, assembled text response.",
   "resp_id": "unique-response-id",
   "status": "success",
   "created_at": 1750283611,
   "model": "qwen/qwen3-32b",
   "usage": {
      "input_tokens": 12,
      "output_tokens": 15,
      "total_tokens": 27
   },
   "fallback": {
      "isUsed": false,
      "reason": null
   }
}

type: Always 'completed'.
token: An empty string.
completeOutput: The full, final generated string.
The remaining fields (resp_id, status, model, usage, etc.) are the same as in the non-streaming response.

Example Usage

Here is how you would process a streaming response to build the full output and access the final metadata.

const stream = await client.run('your-prompt', { stream: true });

let fullResponse = '';
let finalResponse = null;

for await (const chunk of stream) {
   if (chunk.type === 'delta') {
      // Append the token to your full response string
      const token = chunk.token;
      fullResponse += token;
      // You can process the token here (e.g., render to UI)
      process.stdout.write(token);
   } else if (chunk.type === 'completed') {
      // The stream is done.
      // 'chunk' now contains the final metadata.
      finalResponse = chunk;
   }
}

// After the loop, you can use the final data
console.log('\n\n--- Stream Complete ---');
console.log('Full assembled response:', fullResponse);
console.log('Final metadata:', finalResponse);
console.log('Total output tokens:', finalResponse.usage.output_tokens);

💡 Examples

Basic OpenAI Usage

import { RunnerClient } from 'llmforge';
const client = await RunnerClient.create({
   llmConfig: {
      apiKey: process.env.OPENAI_API_KEY,
      provider: 'openai',
      model: 'gpt-4o-mini',
   },
});
const response = await client.run([{ role: 'user', parts: [{ text: 'Explain quantum computing simply.' }] }]);
console.log(response.output);

Basic Gemini Usage

import { RunnerClient } from 'llmforge';
const client = await RunnerClient.create({
   llmConfig: {
      apiKey: process.env.GOOGLE_API_KEY,
      provider: 'google',
      model: 'gemini-1.5-flash',
   },
});
const response = await client.run([{ role: 'user', parts: [{ text: 'Write a haiku about technology.' }] }]);
console.log(response.output);

Basic Groq Usage

import { RunnerClient } from 'llmforge';
const client = await RunnerClient.create({
   llmConfig: {
      apiKey: process.env.GROQ_API_KEY,
      provider: 'groq',
      model: 'llama-3-8b',
   },
});
const response = await client.run([{ role: 'user', parts: [{ text: 'What is the philosophy of absurdism?' }] }]);
console.log(response.output);

📚 API Reference

Click to expand API Reference

`RunnerClient`

`RunnerClient.create(config)`

Creates and initializes a new LLMForge client instance.

Parameters:
- config: The main configuration object.
Returns: Promise<RunnerClient>

`client.run(messages)`

Executes a prompt against the configured LLM provider(s).

Parameters:
- messages: An array of message objects in the unified format.
Returns: Promise<Response> for non-streaming calls, or Promise<AsyncIterable<StreamResponse>> for streaming calls.

Configuration Options

`llmConfig` Object

apiKey: Your API key for the provider.
provider: The provider to use ('openai', 'google', 'groq').
model: The specific model ID to use.
stream?: (Optional) Set to true to enable streaming. Defaults to false.
priority?: (Optional) A number (1, 2, etc.) to set the order for fallback.
generationConfig?: (Optional) Parameters to control the model's output.
retryConfig?: (Optional) Settings for automatic retries on failure.

`generationConfig`

temperature: Controls randomness (e.g., 0.7).
maxOutputTokens: Maximum tokens in the response.
topP: Nucleus sampling parameter.
topK: Top-k sampling parameter.

`retryConfig`

maxRetries: Maximum number of retry attempts.
retryDelay: Delay between retries in milliseconds.

🔑 Environment Variables

For security, manage your API keys using environment variables. Create a .env file in your project root:

OPENAI_API_KEY=your-openai-api-key
GOOGLE_API_KEY=your-google-api-key
GROQ_API_KEY=your-groq-api-key

And load them in your application using a library like dotenv.

🗺️ Roadmap

[✔️] OpenAI Support
[✔️] Google Gemini Support
[✔️] Groq Support
[✔️] Intelligent Fallback & Retry Logic
[✔️] Token Usage Tracking
[✔️] Streaming Responses
[✔️] Full TypeScript Support
Ollama Support for Local Models
Custom Model Endpoint Support
Anthropic Claude Support
Azure OpenAI Support
Response Caching
Unified Function Calling / Tool Use

🤝 Contributing

We welcome contributions! Please follow the guidelines in our CONTRIBUTING.md file or check out the quick guide below.

Contribution Quick Guide

Fork & Clone: Fork the repository and clone it locally.
Create a Branch: Create a new branch for your feature or bug fix (git checkout -b feature/my-new-feature).
Code: Implement your changes. Add new provider interfaces, handle errors gracefully, and add or update tests.
Format: Run npm run format to ensure your code matches the project's style.
Test: Run npm test to ensure all tests pass.
Create a Pull Request: Push your branch to GitHub and create a Pull Request with a clear title and description. Reference any related issues.

❤️ Support

📧 Email: harshanand.cloud@gmail.com
🐛 Issues: Report a bug or request a feature on GitHub Issues

MIT License - see the LICENSE file for details.

Built with ❤️ for the AI developer community by nginH.

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
src		src
tests		tests
.gitignore		.gitignore
.npmrc		.npmrc
.prettierrc		.prettierrc
.releaserc.json		.releaserc.json
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.cjs.json		tsconfig.cjs.json
tsconfig.esm.json		tsconfig.esm.json
tsconfig.json		tsconfig.json

License

nginH/llmforge

Folders and files

Latest commit

History

Repository files navigation

LLMForge 🔥

📋 Table of Contents

✨ Features

🚀 Quick Start

📦 Installation

🔗 Supported Providers & Models

⚡ Streaming Responses

⚙️ Configuration

Single Provider

Multiple Providers with Fallback

📝 Message & Response Format

Message Format

Response Format (Non-Streaming)

Response Format (Streaming)

Delta Chunk

Completed Chunk

Example Usage

💡 Examples

Basic OpenAI Usage

Basic Gemini Usage

Basic Groq Usage

📚 API Reference

RunnerClient

RunnerClient.create(config)

client.run(messages)

Configuration Options

llmConfig Object

generationConfig

retryConfig

🔑 Environment Variables

🗺️ Roadmap

🤝 Contributing

❤️ Support

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

`RunnerClient`

`RunnerClient.create(config)`

`client.run(messages)`

`llmConfig` Object

`generationConfig`

`retryConfig`

Packages