lua-cgemma

Lua bindings for gemma.cpp.

Requirements

Before starting, you should have installed:

CMake
C++ compiler, supporting at least C++17
LuaJIT, recommended to install OpenResty directly

Installation

1st step: Clone the source code from GitHub: git clone https://github.com/ufownl/lua-cgemma.git

2nd step: Build and install:

To build and install using the default settings, just enter the repository's directory and run the following commands:

mkdir build
cd build
cmake .. && make
sudo make install

3rd step: See here to learn how to obtain model weights and tokenizer.

Usage

Synopsis

-- Create a Gemma instance
local gemma, err = require("cgemma").new({
  tokenizer = "/path/to/tokenizer.spm",
  model = "gemma3-4b",
  weights = "/path/to/4b-it-sfp.sbs"
})
if not gemma then
  error("Opoos! "..err)
end

-- Create a chat session
local session, err = gemma:session()
if not session then
  error("Opoos! "..err)
end

while true do
  print("New conversation started")

  -- Multi-turn chat loop
  while session:ready() do
    io.write("> ")
    local text = io.read()
    if not text then
      print("End of file")
      return
    end
    -- Generate reply
    local reply, err = session(text)
    if not reply then
      error("Opoos! "..err)
    end
    print("reply: ", reply)
  end

  print("Exceed the maximum number of tokens")
  session:reset()
end

APIs for Lua

cgemma.info

syntax: cgemma.info()

Show information of cgemma module.

cgemma.scheduler

syntax: <cgemma.scheduler>sched, <string>err = cgemma.scheduler([<table>options])

Create a scheduler instance.

A successful call returns a scheduler instance. Otherwise, it returns nil and a string describing the error.

Available options and default values:

{
  num_threads = 0,  -- Maximum number of threads to use. (0 = unlimited)
  pin = -1,  -- Pin threads? (-1 = auto, 0 = no, 1 = yes)
  skip_packages = 0,  -- Index of the first socket to use. (0 = unlimited)
  max_packages = 0,  -- Maximum number of sockets to use. (0 = unlimited)
  skip_clusters = 0,  -- Index of the first CCX to use. (0 = unlimited)
  max_clusters = 0,  -- Maximum number of CCXs to use. (0 = unlimited)
  skip_lps = 0,  -- Index of the first LP to use. (0 = unlimited)
  max_lps = 0,  -- Maximum number of LPs to use. (0 = unlimited)
}

cgemma.scheduler.cpu_topology

syntax: <string>desc = sched:cpu_topology()

Query CPU topology.

cgemma.new

syntax: <cgemma.instance>inst, <string>err = cgemma.new(<table>options)

Create a Gemma instance.

A successful call returns a Gemma instance. Otherwise, it returns nil and a string describing the error.

Available options:

{
  tokenizer = "/path/to/tokenizer.spm",  -- Path of tokenizer model file.
  model = "gemma3-4b",  -- Model type:
                        -- 2b-it (Gemma 2B parameters, instruction-tuned)
                        -- 2b-pt (Gemma 2B parameters, pretrained)
                        -- 7b-it (Gemma 7B parameters, instruction-tuned)
                        -- 7b-pt (Gemma 7B parameters, pretrained)
                        -- gr2b-it (Griffin 2B parameters, instruction-tuned)
                        -- gr2b-pt (Griffin 2B parameters, pretrained)
                        -- gemma2-2b-it (Gemma2 2B parameters, instruction-tuned)
                        -- gemma2-2b-pt (Gemma2 2B parameters, pretrained)
                        -- 9b-it (Gemma2 9B parameters, instruction-tuned)
                        -- 9b-pt (Gemma2 9B parameters, pretrained)
                        -- 27b-it (Gemma2 27B parameters, instruction-tuned)
                        -- 27b-pt (Gemma2 27B parameters, pretrained)
                        -- paligemma-224 (PaliGemma 224*224)
                        -- paligemma-448 (PaliGemma 448*448)
                        -- paligemma2-3b-224 (PaliGemma2 3B 224*224)
                        -- paligemma2-3b-448 (PaliGemma2 3B 448*448)
                        -- paligemma2-10b-224 (PaliGemma2 10B 224*224)
                        -- paligemma2-10b-448 (PaliGemma2 10B 448*448)
                        -- gemma3-4b (Gemma3 4B parameters)
                        -- gemma3-1b (Gemma3 1B parameters)
                        -- gemma3-12b (Gemma3 12B parameters)
                        -- gemma3-27b (Gemma3 27B parameters)
  weights = "/path/to/4b-it-sfp.sbs",  -- Path of model weights file. (requirednuq)
  weight_type = "sfp",  -- Weight type:
                        -- sfp (8-bit FP, default)
                        -- f32 (float)
                        -- bf16 (bfloat16)
                        -- nuq (non-uniform quantization)
                        -- f64 (double)
                        -- c64 (complex double)
                        -- u128 (uint128)
  seed = 42,  -- Random seed. (default is random setting)
  scheduler = sched_inst,  -- Instance of scheduler, if not provided a default
                           -- scheduler will be attached.
  disabled_words = {...},  -- Words you don't want to generate.
}

Note

If the weights file is not in the new single-file format, then tokenizer and model options are required.

cgemma.instance.disabled_tokens

syntax: <table>tokens = inst:disabled_tokens()

Query the disabled tokens of a Gemma instance.

cgemma.instance.embed_image

syntax: <cgemma.image_tokens>img, <string>err = inst:embed_image(<string>data_or_path)

Load image data from the given Lua string or a specific file (PPM format: P6, binary) and embed it into the image tokens.

syntax: <cgemma.image_tokens>img, <string>err = inst:embed_image(<integer>width, <integer>height, <table>values)

Create an image with the given width, height, and pixel values, and embed it into the image tokens.

A successful call returns a cgemma.image_tokens object containing the image tokens. Otherwise, it returns nil and a string describing the error.

cgemma.instance.session

syntax: <cgemma.session>sess, <string>err = inst:session([<table>options])

Create a chat session.

A successful call returns the session. Otherwise, it returns nil and a string describing the error.

Available options and default values:

{
  max_generated_tokens = 2048,  -- Maximum number of tokens to generate.
  prefill_tbatch = 256,  -- Prefill: max tokens per batch.
  decode_qbatch = 16,  -- Decode: max queries per batch.
  temperature = 1.0,  -- Temperature for top-K.
  top_k = 1,  -- Number of top-K tokens to sample from.
  no_wrapping = false,  -- Whether to force disable instruction-tuned wrapping.
}

cgemma.session.ready

syntax: <boolean>ok = sess:ready()

Check if the session is ready to chat.

cgemma.session.reset

syntax: sess:reset()

Reset the session to start a new conversation.

cgemma.session.dumps

syntax: <string>data, <string>err = sess:dumps()

Dump the current state of the session to a Lua string.

A successful call returns a Lua string that stores state data (binary) of the session. Otherwise, it returns nil and a string describing the error.

cgemma.session.loads

syntax: <boolean>ok, <string>err = sess:loads(<string>data)

Load the state data from the given Lua string to restore a previous session.

A successful call returns true. Otherwise, it returns false and a string describing the error.

cgemma.session.dump

syntax: <boolean>ok, <string>err = sess:dump(<string>path)

Dump the current state of the session to a specific file.

A successful call returns true. Otherwise, it returns false and a string describing the error.

cgemma.session.load

syntax: <boolean>ok, <string>err = sess:load(<string>path)

Load the state data from the given file to restore a previous session.

A successful call returns true. Otherwise, it returns false and a string describing the error.

cgemma.session.stats

syntax: <table>statistics = sess:stats()

Get statistics for the current session.

Example of statistics:

{
  prefill_duration = 1.6746909224894,
  prefill_tokens = 26,
  prefill_tokens_per_second = 15.525252839701,
  time_to_first_token = 1.9843131969683,
  generate_duration = 38.562645539409,
  tokens_generated = 212,
  generate_tokens_per_second = 5.4975481332926
}

metatable(cgemma.session).__call

syntax: <string or boolean>reply, <string>err = sess([<cgemma.image_tokens>img, ]<string>text[, <function>stream])

Generate reply.

A successful call returns the content of the reply (without a stream function) or true (with a stream function). Otherwise, it returns nil and a string describing the error.

The stream function is defined as follows:

function stream(token, pos, prompt_size)
  if pos < prompt_size then
    -- Gemma is processing the prompt
    io.write(pos == 0 and "reading and thinking ." or ".")
  elseif token then
    -- Stream the token text output by Gemma here
    if pos == prompt_size then
      io.write("\nreply: ")
    end
    io.write(token)
  else
    -- Gemma's output reaches the end
    print()
  end
  io.flush()
  -- return `true` indicates success; return `false` indicates failure and terminates the generation
  return true
end

cgemma.batch

syntax: <cgemma.batch_result>result, <string>err = cgemma.batch([<cgemma.image_tokens>img, ]<cgemma.session>sess, <string>text[, <function>stream], ...)

Generate replies for multiple queries via the batch interface.

A successful call returns a cgemma.batch_result object. Otherwise, it returns nil and a string describing the error.

The stream function is the same as in metatable(cgemma.session).call.

Note

Each element in a batch must start with a session, followed by a string and an optional stream function, with a stream function means that the corresponding session will be in stream mode instead of normal mode;
All sessions in a batch must be created by the same Gemma instance;
Sessions in a batch must not be duplicated;
Inference arguments of batch call: max_generated_tokens, prefill_tbatch, and decode_qbatch will be the minimum value of all sessions, temperature will be the average value of all sessions, and top_k will be the maximum value of all sessions;
The embedded image can only be given as the first argument to a batch call.

cgemma.batch_result.stats

syntax: <table>statistics = result:stats()

Get statistics for the batch call that returned the current result.

The statistics fields are the same as in cgemma.session.stats.

metatable(cgemma.batch_result).call

syntax: <string or boolean>reply, <string>err = result(<cgemma.session>sess)

Query the reply corresponding to the session in the result.

A successful call returns the content of the reply (normal mode) or true (stream mode). Otherwise, it returns nil and a string describing the error.

Migrating to single-file weights format

The weights file now has a new format: a single file that allows the tokenizer and the model type to be contained directly. A tool to migrate from multi-file to single-file is available.

gemma.migrate_weights \
  --tokenizer /path/to/tokenizer.spm --weights /path/to/2.0-2b-it-sfp.sbs \
  --model gemma2-2b-it --output_weights /path/to/2.0-2b-it-sfp-single.sbs

After migration, you can create a Gemma instance using the new weights file like this:

-- Create a Gemma instance
local gemma, err = require("cgemma").new({
  weights = "/path/to/2.0-2b-it-sfp-single.sbs"
})
if not gemma then
  error("Opoos! "..err)
end

License

BSD-3-Clause license. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
.github/workflows		.github/workflows
cmake		cmake
demo		demo
examples		examples
src		src
tools		tools
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

lua-cgemma

Requirements

Installation

Usage

Synopsis

APIs for Lua

cgemma.info

cgemma.scheduler

cgemma.scheduler.cpu_topology

cgemma.new

cgemma.instance.disabled_tokens

cgemma.instance.embed_image

cgemma.instance.session

cgemma.session.ready

cgemma.session.reset

cgemma.session.dumps

cgemma.session.loads

cgemma.session.dump

cgemma.session.load

cgemma.session.stats

metatable(cgemma.session).__call

cgemma.batch

cgemma.batch_result.stats

metatable(cgemma.batch_result).call

Migrating to single-file weights format

License

About

Releases

Packages

Languages

License

ufownl/lua-cgemma

Folders and files

Latest commit

History

Repository files navigation

lua-cgemma

Requirements

Installation

Usage

Synopsis

APIs for Lua

cgemma.info

cgemma.scheduler

cgemma.scheduler.cpu_topology

cgemma.new

cgemma.instance.disabled_tokens

cgemma.instance.embed_image

cgemma.instance.session

cgemma.session.ready

cgemma.session.reset

cgemma.session.dumps

cgemma.session.loads

cgemma.session.dump

cgemma.session.load

cgemma.session.stats

metatable(cgemma.session).__call

cgemma.batch

cgemma.batch_result.stats

metatable(cgemma.batch_result).call

Migrating to single-file weights format

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages