A unified, pluggable AI runtime to run prompts across OpenAI, Gemini, Groq, and more β all with a single line of code.
- LLMForge π₯
- π Unified Interface: Single API for multiple AI providers (OpenAI, Gemini, Groq, etc.).
- πͺΆ Lightweight: Only 60.3 kB package size for minimal bundle impact.
- π Intelligent Fallback: Automatic failover between providers to ensure reliability.
- β³ Configurable Retries: Built-in retry mechanisms with customizable delays.
- π Streaming Support: Handle responses as they're generated, token by token.
- π Token Usage Tracking: Detailed usage statistics for cost monitoring.
- π TypeScript Support: Full type safety and rich IntelliSense.
- π§ Flexible Configuration: Global, per-provider, and per-request settings.
Get up and running in seconds. First, install the package and set up your environment variables (see Environment Variables).
import { RunnerClient } from 'llmforge';
// Configure your primary AI provider
const config = {
llmConfig: {
apiKey: process.env.OPENAI_API_KEY || '',
provider: 'openai',
model: 'gpt-4o-mini',
},
};
// Create a client and run your prompt
const client = await RunnerClient.create(config);
const response = await client.run([
{
role: 'user',
parts: [{ text: 'Hello! Can you tell me a joke?' }],
},
]);
console.log(response.output);
npm install llmforge
# or yarn
yarn add llmforge
# or pnpm
pnpm add llmforge
You can also install a specific version: npm install @nginh/llmforge@2.0.0
LLMForge provides a growing list of integrations with leading AI providers.
Provider | Status | Key Features |
---|---|---|
OpenAI | β Supported | All text models, function calling, JSON mode |
Google Gemini | β Supported | All text models, high context windows |
Groq | β Supported | Blazing-fast inference, streaming support |
Ollama | π§ Coming Soon | Run local models for privacy and offline use |
Custom | π§ Coming Soon | Connect to any user-defined model endpoint |
π€ View OpenAI Model List
const openAITextModelIds = [
// GPT-4 Series
'gpt-4o',
'gpt-4o-2024-05-13',
'gpt-4o-2024-08-06',
'gpt-4o-2024-11-20',
'gpt-4-turbo',
'gpt-4-turbo-preview',
'gpt-4-0125-preview',
'gpt-4-1106-preview',
'gpt-4',
'gpt-4-0314',
'gpt-4-0613',
'gpt-4-32k',
'gpt-4-32k-0314',
'gpt-4-32k-0613',
'gpt-4-vision-preview',
// GPT-4.1 Series (Azure)
'gpt-4.1',
'gpt-4.1-mini',
'gpt-4.1-nano',
// GPT-4.5 Series
'gpt-4.5-preview',
'gpt-4.5-preview-2025-02-27',
// GPT-3.5 Series
'gpt-3.5-turbo',
'gpt-3.5-turbo-0301',
'gpt-3.5-turbo-0613',
'gpt-3.5-turbo-1106',
'gpt-3.5-turbo-0125',
'gpt-3.5-turbo-16k',
'gpt-3.5-turbo-16k-0613',
'gpt-3.5-turbo-instruct',
// O-Series (Reasoning Models)
'o4-mini',
'o3',
'o3-mini',
'o3-mini-2025-01-31',
'o1',
'o1-mini',
'o1-preview',
'o1-mini-2024-09-12',
// Other Models
'chatgpt-4o-latest',
'gpt-4o-mini',
'gpt-4o-mini-2024-07-18',
'codex-mini',
// Deprecated/Legacy
'davinci-002',
'babbage-002',
];
β¨ View Google Gemini Model List
const geminiModelList = [
'gemini-2.5-pro',
'gemini-2.5-pro-preview-05-06',
'gemini-2.5-flash',
'gemini-2.5-flash-preview-04-17',
'gemini-2.5-flash-lite-preview-06-17',
'gemini-2.0-flash',
'gemini-2.0-flash-lite',
'gemma-3n-e4b-it',
'gemma-3-1b-it',
'gemma-3-4b-it',
'gemma-3-12b-it',
'gemma-3-27b-it',
'learnlm-2.0-flash-experimental',
];
β‘ View Groq Model List
const groqModelsList = [
'allam-2-7b',
'compound-beta',
'compound-beta-mini',
'deepseek-r1-distill-llama-70b',
'distil-whisper',
'gemma-2-instruct',
'llama-3-1-8b',
'llama-3-3-70b',
'llama-3-70b',
'llama-3-8b',
'llama-4-maverick-17b-128e',
'llama-4-scout-17b-16e',
'llama-guard-4-12b',
'llama-prompt-guard-2-22m',
'prompt-guard-2-86m',
'mistral-saba-24b',
'playai-tts',
'playai-tts-arabic',
'qwq-32b',
];
LLMForge supports streaming to receive responses token-by-token, ideal for real-time applications like chatbots. Enable it by setting stream: true
in your configuration. When streaming is enabled, client.run()
returns an AsyncIterable
.
import { RunnerClient } from 'llmforge';
const client = await RunnerClient.create({
llmConfig: {
apiKey: process.env.GROQ_API_KEY || '',
provider: 'groq',
model: 'llama-3-8b', // Groq is great for streaming!
stream: true, // Enable streaming
},
});
const stream = await client.run([{ role: 'user', parts: [{ text: 'Write a short story about a robot who discovers music.' }] }]);
let fullResponse = '';
try {
// The response is an async iterable stream of chunks
for await (const chunk of stream) {
if (chunk.output) {
process.stdout.write(chunk.output); // Print each token as it arrives
fullResponse += chunk.output;
}
}
} catch (error) {
console.error('\n\nError during streaming:', error);
}
// After the stream ends, get the final response object with usage stats
const finalResponse = stream.getFinalResponse();
console.log('\n\n--- Streaming Complete ---');
console.log('Full Story:', fullResponse);
console.log('Usage Stats:', finalResponse.usage);
For simple use cases, provide a single llmConfig
object.
const config = {
llmConfig: {
apiKey: 'your-api-key',
provider: 'openai', // or 'google', 'groq'
model: 'gpt-4o-mini',
stream: false, // Optional, defaults to false
generationConfig: {
temperature: 0.7,
maxOutputTokens: 150,
},
retryConfig: {
maxRetries: 3,
retryDelay: 1000,
},
},
};
For resilience, provide an array of llmConfig
objects sorted by priority
. If the provider with priority: 1
fails, LLMForge will automatically try the one with priority: 2
, and so on.
const config = {
llmConfig: [
{
apiKey: process.env.OPENAI_API_KEY,
provider: 'openai',
model: 'gpt-4o',
priority: 1, // Primary provider
},
{
apiKey: process.env.GOOGLE_API_KEY,
provider: 'google',
model: 'gemini-1.5-pro',
priority: 2, // Fallback provider
},
],
enableFallback: true, // Must be true to use the fallback mechanism
};
LLMForge uses a unified message format for multi-turn conversations. The role
can be user
, model
or system
to structure the dialogue history.
const messages = [
{
role: 'user', // The user's prompt
parts: [{ text: 'Tell me about machine learning in 50 words.' }],
},
{
role: 'model', // A previous response from the AI (can also be 'model')
parts: [{ text: 'Machine learning is a subset of AI that enables computers to learn and make decisions from data without explicit programming.' }],
},
{
role: 'user',
parts: [{ text: 'Can you give me an example?' }],
},
];
When stream: false
(the default), client.run()
returns a promise that resolves to a standardized response object.
{
"resp_id": "unique-response-id",
"output": "This is the generated text response from the AI.",
"status": "success",
"created_at": 1750283611,
"model": "gpt-4o-mini",
"usage": {
"input_tokens": 35,
"output_tokens": 120,
"total_tokens": 155
},
"fallback": {
"isUsed": true, // This becomes true if a fallback provider was used
"reason": "API error from primary provider: 500 Internal Server Error"
}
}
When stream: true
, client.run()
returns an AsyncIterable
. You can use a for await...of
loop to iterate over the chunks as they are sent from the server.
There are two types of chunks you will receive:
- Delta Chunks: These are sent as the AI generates the response, token by token.
- Completed Chunk: This is the final message in the stream, containing the full output and metadata.
This chunk represents an incremental part of the generated text.
{
"type": "delta",
"token": "a single token or word"
}
type
: Always'delta'
.token
: Astring
containing the next piece of the generated text.
This is the final chunk sent when the stream is finished. It contains the complete response and final metadata, similar to the non-streaming response.
{
"type": "completed",
"token": "",
"completeOutput": "The full, assembled text response.",
"resp_id": "unique-response-id",
"status": "success",
"created_at": 1750283611,
"model": "qwen/qwen3-32b",
"usage": {
"input_tokens": 12,
"output_tokens": 15,
"total_tokens": 27
},
"fallback": {
"isUsed": false,
"reason": null
}
}
type
: Always'completed'
.token
: An empty string.completeOutput
: The full, final generated string.- The remaining fields (
resp_id
,status
,model
,usage
, etc.) are the same as in the non-streaming response.
Here is how you would process a streaming response to build the full output and access the final metadata.
const stream = await client.run('your-prompt', { stream: true });
let fullResponse = '';
let finalResponse = null;
for await (const chunk of stream) {
if (chunk.type === 'delta') {
// Append the token to your full response string
const token = chunk.token;
fullResponse += token;
// You can process the token here (e.g., render to UI)
process.stdout.write(token);
} else if (chunk.type === 'completed') {
// The stream is done.
// 'chunk' now contains the final metadata.
finalResponse = chunk;
}
}
// After the loop, you can use the final data
console.log('\n\n--- Stream Complete ---');
console.log('Full assembled response:', fullResponse);
console.log('Final metadata:', finalResponse);
console.log('Total output tokens:', finalResponse.usage.output_tokens);
import { RunnerClient } from 'llmforge';
const client = await RunnerClient.create({
llmConfig: {
apiKey: process.env.OPENAI_API_KEY,
provider: 'openai',
model: 'gpt-4o-mini',
},
});
const response = await client.run([{ role: 'user', parts: [{ text: 'Explain quantum computing simply.' }] }]);
console.log(response.output);
import { RunnerClient } from 'llmforge';
const client = await RunnerClient.create({
llmConfig: {
apiKey: process.env.GOOGLE_API_KEY,
provider: 'google',
model: 'gemini-1.5-flash',
},
});
const response = await client.run([{ role: 'user', parts: [{ text: 'Write a haiku about technology.' }] }]);
console.log(response.output);
import { RunnerClient } from 'llmforge';
const client = await RunnerClient.create({
llmConfig: {
apiKey: process.env.GROQ_API_KEY,
provider: 'groq',
model: 'llama-3-8b',
},
});
const response = await client.run([{ role: 'user', parts: [{ text: 'What is the philosophy of absurdism?' }] }]);
console.log(response.output);
Click to expand API Reference
Creates and initializes a new LLMForge client instance.
- Parameters:
config
: The main configuration object.
- Returns:
Promise<RunnerClient>
Executes a prompt against the configured LLM provider(s).
- Parameters:
messages
: An array of message objects in the unified format.
- Returns:
Promise<Response>
for non-streaming calls, orPromise<AsyncIterable<StreamResponse>>
for streaming calls.
apiKey
: Your API key for the provider.provider
: The provider to use ('openai'
,'google'
,'groq'
).model
: The specific model ID to use.stream?
: (Optional) Set totrue
to enable streaming. Defaults tofalse
.priority?
: (Optional) A number (1
,2
, etc.) to set the order for fallback.generationConfig?
: (Optional) Parameters to control the model's output.retryConfig?
: (Optional) Settings for automatic retries on failure.
temperature
: Controls randomness (e.g.,0.7
).maxOutputTokens
: Maximum tokens in the response.topP
: Nucleus sampling parameter.topK
: Top-k sampling parameter.
maxRetries
: Maximum number of retry attempts.retryDelay
: Delay between retries in milliseconds.
For security, manage your API keys using environment variables. Create a .env
file in your project root:
OPENAI_API_KEY=your-openai-api-key
GOOGLE_API_KEY=your-google-api-key
GROQ_API_KEY=your-groq-api-key
And load them in your application using a library like dotenv
.
- [βοΈ] OpenAI Support
- [βοΈ] Google Gemini Support
- [βοΈ] Groq Support
- [βοΈ] Intelligent Fallback & Retry Logic
- [βοΈ] Token Usage Tracking
- [βοΈ] Streaming Responses
- [βοΈ] Full TypeScript Support
- Ollama Support for Local Models
- Custom Model Endpoint Support
- Anthropic Claude Support
- Azure OpenAI Support
- Response Caching
- Unified Function Calling / Tool Use
We welcome contributions! Please follow the guidelines in our CONTRIBUTING.md
file or check out the quick guide below.
Contribution Quick Guide
- Fork & Clone: Fork the repository and clone it locally.
- Create a Branch: Create a new branch for your feature or bug fix (
git checkout -b feature/my-new-feature
). - Code: Implement your changes. Add new provider interfaces, handle errors gracefully, and add or update tests.
- Format: Run
npm run format
to ensure your code matches the project's style. - Test: Run
npm test
to ensure all tests pass. - Create a Pull Request: Push your branch to GitHub and create a Pull Request with a clear title and description. Reference any related issues.
- π§ Email: harshanand.cloud@gmail.com
- π Issues: Report a bug or request a feature on GitHub Issues
MIT License - see the LICENSE file for details.
Built with β€οΈ for the AI developer community by nginH.