Skip to content

_LlamaModel.metadata() does not return tokenizer.ggml.tokens #1495

Open
@ddh0

Description

@ddh0
Contributor

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
    I carefully followed the README.md.
    I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
    I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

The dictionary returned by _LlamaModel.metadata() should include tokenizer.ggml.tokens as a key, so the vocabulary of the model can be accessed from the high-level API.

Current Behavior

The dictionary returned by _LlamaModel.metadata() does not include tokenizer.ggml.tokens as a key, so the vocabulary of the model cannot be accessed from the high-level API.

Environment and Context

Running latest llama-cpp-python built from source - package version 0.2.76 at the time of writing.

Steps to Reproduce

  • Construct an instance of Llama
  • View Llama.metadata
  • Look for a key called tokenizer.ggml.metadata
  • Do not find it

Activity

ddh0

ddh0 commented on May 29, 2024

@ddh0
ContributorAuthor

For reference, here is the code I am currently using to read GGUF metadata from a file, including tokenizer.ggml.tokens.

This code is not as robust as the code currently in llama-cpp-python, but I have been using it for a long time and it does work as expected with various models - you can cross-reference with the llama.cpp backend output to verify this.

Hopefully this code can be useful as a reference while looking into this issue.

Thanks, @abetlen!

CISC

CISC commented on May 29, 2024

@CISC
Contributor

The problem is that the llama.cpp API does not return metadata arrays.

Using gguf.py to read metadata in the same fashion you do would solve that, but it would technically mean loading the model twice, which is not a good solution. :(

ddh0

ddh0 commented on May 29, 2024

@ddh0
ContributorAuthor

The code I shared doesn't load the model, it just reads bytes from the header of the GGUF file, unless I'm woefully misunderstanding something. Could you please clarify what you mean by loading the model twice?

CISC

CISC commented on May 30, 2024

@CISC
Contributor

I meant if going the route of side-loading the metadata it would probably be best using the official gguf.py, which even though it doesn't load the whole model into memory technically means you would open it twice and at least duplicate some of that data.

The better approach would be to add support for metadata arrays in the llama.cpp API, but I guess it is a little cumbersome to expose somewhat cleanly through a C ABI.

ddh0

ddh0 commented on May 30, 2024

@ddh0
ContributorAuthor

Oh, I see. I'll take a look at gguf.py and see if I can make a PR that would help sometime soon. Thanks

CISC

CISC commented on May 30, 2024

@CISC
Contributor

Do note that the currently published gguf.py is quite outdated compared to llama.cpp master branch, but hopefully it will get bumped soon.

ddh0

ddh0 commented on May 30, 2024

@ddh0
ContributorAuthor

I am working on a proper fix for this: 1497 #1525

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @CISC@ddh0

        Issue actions

          _LlamaModel.metadata() does not return `tokenizer.ggml.tokens` · Issue #1495 · abetlen/llama-cpp-python