Lua bindings for gemma.cpp.
Before starting, you should have installed:
1st step: Clone the source code from GitHub: git clone https://github.com/ufownl/lua-cgemma.git
2nd step: Build and install:
To build and install using the default settings, just enter the repository's directory and run the following commands:
mkdir build
cd build
cmake .. && make
sudo make install
3rd step: See here to learn how to obtain model weights and tokenizer.
-- Create a Gemma instance
local gemma, err = require("cgemma").new({
tokenizer = "/path/to/tokenizer.spm",
model = "gemma3-4b",
weights = "/path/to/4b-it-sfp.sbs"
})
if not gemma then
error("Opoos! "..err)
end
-- Create a chat session
local session, err = gemma:session()
if not session then
error("Opoos! "..err)
end
while true do
print("New conversation started")
-- Multi-turn chat loop
while session:ready() do
io.write("> ")
local text = io.read()
if not text then
print("End of file")
return
end
-- Generate reply
local reply, err = session(text)
if not reply then
error("Opoos! "..err)
end
print("reply: ", reply)
end
print("Exceed the maximum number of tokens")
session:reset()
end
syntax: cgemma.info()
Show information of cgemma module.
syntax: <cgemma.scheduler>sched, <string>err = cgemma.scheduler([<table>options])
Create a scheduler instance.
A successful call returns a scheduler instance. Otherwise, it returns nil
and a string describing the error.
Available options and default values:
{
num_threads = 0, -- Maximum number of threads to use. (0 = unlimited)
pin = -1, -- Pin threads? (-1 = auto, 0 = no, 1 = yes)
skip_packages = 0, -- Index of the first socket to use. (0 = unlimited)
max_packages = 0, -- Maximum number of sockets to use. (0 = unlimited)
skip_clusters = 0, -- Index of the first CCX to use. (0 = unlimited)
max_clusters = 0, -- Maximum number of CCXs to use. (0 = unlimited)
skip_lps = 0, -- Index of the first LP to use. (0 = unlimited)
max_lps = 0, -- Maximum number of LPs to use. (0 = unlimited)
}
syntax: <string>desc = sched:cpu_topology()
Query CPU topology.
syntax: <cgemma.instance>inst, <string>err = cgemma.new(<table>options)
Create a Gemma instance.
A successful call returns a Gemma instance. Otherwise, it returns nil
and a string describing the error.
Available options:
{
tokenizer = "/path/to/tokenizer.spm", -- Path of tokenizer model file.
model = "gemma3-4b", -- Model type:
-- 2b-it (Gemma 2B parameters, instruction-tuned)
-- 2b-pt (Gemma 2B parameters, pretrained)
-- 7b-it (Gemma 7B parameters, instruction-tuned)
-- 7b-pt (Gemma 7B parameters, pretrained)
-- gr2b-it (Griffin 2B parameters, instruction-tuned)
-- gr2b-pt (Griffin 2B parameters, pretrained)
-- gemma2-2b-it (Gemma2 2B parameters, instruction-tuned)
-- gemma2-2b-pt (Gemma2 2B parameters, pretrained)
-- 9b-it (Gemma2 9B parameters, instruction-tuned)
-- 9b-pt (Gemma2 9B parameters, pretrained)
-- 27b-it (Gemma2 27B parameters, instruction-tuned)
-- 27b-pt (Gemma2 27B parameters, pretrained)
-- paligemma-224 (PaliGemma 224*224)
-- paligemma-448 (PaliGemma 448*448)
-- paligemma2-3b-224 (PaliGemma2 3B 224*224)
-- paligemma2-3b-448 (PaliGemma2 3B 448*448)
-- paligemma2-10b-224 (PaliGemma2 10B 224*224)
-- paligemma2-10b-448 (PaliGemma2 10B 448*448)
-- gemma3-4b (Gemma3 4B parameters)
-- gemma3-1b (Gemma3 1B parameters)
-- gemma3-12b (Gemma3 12B parameters)
-- gemma3-27b (Gemma3 27B parameters)
weights = "/path/to/4b-it-sfp.sbs", -- Path of model weights file. (requirednuq)
weight_type = "sfp", -- Weight type:
-- sfp (8-bit FP, default)
-- f32 (float)
-- bf16 (bfloat16)
-- nuq (non-uniform quantization)
-- f64 (double)
-- c64 (complex double)
-- u128 (uint128)
seed = 42, -- Random seed. (default is random setting)
scheduler = sched_inst, -- Instance of scheduler, if not provided a default
-- scheduler will be attached.
disabled_words = {...}, -- Words you don't want to generate.
}
Note
If the weights file is not in the new single-file format, then tokenizer
and model
options are required.
syntax: <table>tokens = inst:disabled_tokens()
Query the disabled tokens of a Gemma instance.
syntax: <cgemma.image_tokens>img, <string>err = inst:embed_image(<string>data_or_path)
Load image data from the given Lua string or a specific file (PPM format: P6, binary) and embed it into the image tokens.
syntax: <cgemma.image_tokens>img, <string>err = inst:embed_image(<integer>width, <integer>height, <table>values)
Create an image with the given width, height, and pixel values, and embed it into the image tokens.
A successful call returns a cgemma.image_tokens
object containing the image tokens. Otherwise, it returns nil
and a string describing the error.
syntax: <cgemma.session>sess, <string>err = inst:session([<table>options])
Create a chat session.
A successful call returns the session. Otherwise, it returns nil
and a string describing the error.
Available options and default values:
{
max_generated_tokens = 2048, -- Maximum number of tokens to generate.
prefill_tbatch = 256, -- Prefill: max tokens per batch.
decode_qbatch = 16, -- Decode: max queries per batch.
temperature = 1.0, -- Temperature for top-K.
top_k = 1, -- Number of top-K tokens to sample from.
no_wrapping = false, -- Whether to force disable instruction-tuned wrapping.
}
syntax: <boolean>ok = sess:ready()
Check if the session is ready to chat.
syntax: sess:reset()
Reset the session to start a new conversation.
syntax: <string>data, <string>err = sess:dumps()
Dump the current state of the session to a Lua string.
A successful call returns a Lua string that stores state data (binary) of the session. Otherwise, it returns nil
and a string describing the error.
syntax: <boolean>ok, <string>err = sess:loads(<string>data)
Load the state data from the given Lua string to restore a previous session.
A successful call returns true
. Otherwise, it returns false
and a string describing the error.
syntax: <boolean>ok, <string>err = sess:dump(<string>path)
Dump the current state of the session to a specific file.
A successful call returns true
. Otherwise, it returns false
and a string describing the error.
syntax: <boolean>ok, <string>err = sess:load(<string>path)
Load the state data from the given file to restore a previous session.
A successful call returns true
. Otherwise, it returns false
and a string describing the error.
syntax: <table>statistics = sess:stats()
Get statistics for the current session.
Example of statistics:
{
prefill_duration = 1.6746909224894,
prefill_tokens = 26,
prefill_tokens_per_second = 15.525252839701,
time_to_first_token = 1.9843131969683,
generate_duration = 38.562645539409,
tokens_generated = 212,
generate_tokens_per_second = 5.4975481332926
}
syntax: <string or boolean>reply, <string>err = sess([<cgemma.image_tokens>img, ]<string>text[, <function>stream])
Generate reply.
A successful call returns the content of the reply (without a stream function) or true
(with a stream function). Otherwise, it returns nil
and a string describing the error.
The stream function is defined as follows:
function stream(token, pos, prompt_size)
if pos < prompt_size then
-- Gemma is processing the prompt
io.write(pos == 0 and "reading and thinking ." or ".")
elseif token then
-- Stream the token text output by Gemma here
if pos == prompt_size then
io.write("\nreply: ")
end
io.write(token)
else
-- Gemma's output reaches the end
print()
end
io.flush()
-- return `true` indicates success; return `false` indicates failure and terminates the generation
return true
end
syntax: <cgemma.batch_result>result, <string>err = cgemma.batch([<cgemma.image_tokens>img, ]<cgemma.session>sess, <string>text[, <function>stream], ...)
Generate replies for multiple queries via the batch interface.
A successful call returns a cgemma.batch_result
object. Otherwise, it returns nil
and a string describing the error.
The stream function is the same as in metatable(cgemma.session).call.
Note
- Each element in a batch must start with a session, followed by a string and an optional stream function, with a stream function means that the corresponding session will be in stream mode instead of normal mode;
- All sessions in a batch must be created by the same Gemma instance;
- Sessions in a batch must not be duplicated;
- Inference arguments of batch call:
max_generated_tokens
,prefill_tbatch
, anddecode_qbatch
will be the minimum value of all sessions,temperature
will be the average value of all sessions, andtop_k
will be the maximum value of all sessions; - The embedded image can only be given as the first argument to a batch call.
syntax: <table>statistics = result:stats()
Get statistics for the batch call that returned the current result.
The statistics fields are the same as in cgemma.session.stats.
syntax: <string or boolean>reply, <string>err = result(<cgemma.session>sess)
Query the reply corresponding to the session in the result.
A successful call returns the content of the reply (normal mode) or true
(stream mode). Otherwise, it returns nil
and a string describing the error.
The weights file now has a new format: a single file that allows the tokenizer and the model type to be contained directly. A tool to migrate from multi-file to single-file is available.
gemma.migrate_weights \
--tokenizer /path/to/tokenizer.spm --weights /path/to/2.0-2b-it-sfp.sbs \
--model gemma2-2b-it --output_weights /path/to/2.0-2b-it-sfp-single.sbs
After migration, you can create a Gemma instance using the new weights file like this:
-- Create a Gemma instance
local gemma, err = require("cgemma").new({
weights = "/path/to/2.0-2b-it-sfp-single.sbs"
})
if not gemma then
error("Opoos! "..err)
end
BSD-3-Clause license. See LICENSE for details.