Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_LlamaModel.metadata() does not return tokenizer.ggml.tokens #1495

Open
4 tasks done
ddh0 opened this issue May 29, 2024 · 7 comments
Open
4 tasks done

_LlamaModel.metadata() does not return tokenizer.ggml.tokens #1495

ddh0 opened this issue May 29, 2024 · 7 comments

Comments

@ddh0
Copy link
Contributor

ddh0 commented May 29, 2024

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • I carefully followed the README.md.
  • I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

The dictionary returned by _LlamaModel.metadata() should include tokenizer.ggml.tokens as a key, so the vocabulary of the model can be accessed from the high-level API.

Current Behavior

The dictionary returned by _LlamaModel.metadata() does not include tokenizer.ggml.tokens as a key, so the vocabulary of the model cannot be accessed from the high-level API.

Environment and Context

Running latest llama-cpp-python built from source - package version 0.2.76 at the time of writing.

Steps to Reproduce

  • Construct an instance of Llama
  • View Llama.metadata
  • Look for a key called tokenizer.ggml.metadata
  • Do not find it
@ddh0
Copy link
Contributor Author

ddh0 commented May 29, 2024

For reference, here is the code I am currently using to read GGUF metadata from a file, including tokenizer.ggml.tokens.

This code is not as robust as the code currently in llama-cpp-python, but I have been using it for a long time and it does work as expected with various models - you can cross-reference with the llama.cpp backend output to verify this.

Hopefully this code can be useful as a reference while looking into this issue.

Thanks, @abetlen!

@CISC
Copy link
Contributor

CISC commented May 29, 2024

The problem is that the llama.cpp API does not return metadata arrays.

Using gguf.py to read metadata in the same fashion you do would solve that, but it would technically mean loading the model twice, which is not a good solution. :(

@ddh0
Copy link
Contributor Author

ddh0 commented May 29, 2024

The code I shared doesn't load the model, it just reads bytes from the header of the GGUF file, unless I'm woefully misunderstanding something. Could you please clarify what you mean by loading the model twice?

@CISC
Copy link
Contributor

CISC commented May 30, 2024

I meant if going the route of side-loading the metadata it would probably be best using the official gguf.py, which even though it doesn't load the whole model into memory technically means you would open it twice and at least duplicate some of that data.

The better approach would be to add support for metadata arrays in the llama.cpp API, but I guess it is a little cumbersome to expose somewhat cleanly through a C ABI.

@ddh0
Copy link
Contributor Author

ddh0 commented May 30, 2024

Oh, I see. I'll take a look at gguf.py and see if I can make a PR that would help sometime soon. Thanks

@CISC
Copy link
Contributor

CISC commented May 30, 2024

Do note that the currently published gguf.py is quite outdated compared to llama.cpp master branch, but hopefully it will get bumped soon.

@ddh0
Copy link
Contributor Author

ddh0 commented May 30, 2024

I am working on a proper fix for this: 1497 #1525

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants