Skip to content

Conversation

@aittalam
Copy link
Member

@aittalam aittalam commented Jan 13, 2026

  • implements multimodal (image) support in the TUI chatbot using the new mtmd API, replacing the previous stub that returned "image processing not yet implemented"
  • aligns with llama.cpp server's multimodal handling for consistency

Closes #851

Changes

  • added eval_mtmd_chunks() helper that evaluates tokenized chunks using mtmd_helper_eval_chunk_single() (matching the approach used by llama.cpp server)
  • refactored eval_string() to:
    • scan input for data:image/... URIs
    • collect images as mtmd::bitmap objects
    • replace data URIs with <__media__> markers
    • tokenize text + images together via mtmd_tokenize()
    • evaluate the resulting chunks

Testing this PR

  • Build with make -j8
  • Run with a multimodal model: ./llamafile -m <model.gguf> --mmproj <mmproj.gguf>
  • Use /upload <image_path> to upload an image and verify it processes correctly (note: spaces are not allowed in <image_path> yet)
  • Test text-only prompts still work (fast path)
  • Test multiple images in a single prompt

@aittalam aittalam marked this pull request as ready for review January 13, 2026 18:37
@aittalam aittalam merged commit caddf2a into new_build_wip Jan 15, 2026
1 check passed
@aittalam aittalam deleted the mtmd_in_TUI branch January 15, 2026 17:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants