-
Notifications
You must be signed in to change notification settings - Fork 938
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support all types of GGUF metadata [closed] #1497
Conversation
This comment was marked as off-topic.
This comment was marked as off-topic.
With a small patch to llama.py, the new metadata is displayed correctly when loading a model: Click to expand full output
|
Nice, however I'm still worried that At a quick glance I can see that you are not doing any version checks, nor are you taking into account file endianness. |
Good point about file endianness, I will work on that now. However, if by version checks you mean GGUF version, I don't think that's necessary - if you take a look at If that's not what you meant, could you please clarify? |
(source: GGUF format spec) The change I just made supports little or big endian GGUF files. It does so by assuming that the GGUF file is of the same endianness as the host machine. This assumption is safe to make because, at this point in the code, the model itself is already successfully loaded, so it must be the same endianness as the host. Per the format specification, I think this is a pretty reasonable solution. |
Since You can simplify your code by using the native byteorder marker ( |
Gotcha. I can work on supporting GGUFv2. v1 will remain unsupported. |
This comment was marked as outdated.
This comment was marked as outdated.
GGUFv2 support is now fully working. Tested with Example with wizard-vicuna-13b-uncensored-superhot-8k.Q4_K_S.gguf - click to expand
|
Looking good, however there are a couple of issues:
|
I will wait for @abetlen to give his thoughts on this - I think it would be confusing to have two separate methods and properties |
On a different note, I do notice these lines in llama.py: # Unfortunately the llama.cpp API does not return metadata arrays, so we can't get template names from tokenizer.chat_templates
template_choices = dict((name[10:], template) for name, template in self.metadata.items() if name.startswith("tokenizer.chat_template.")) I'm guessing this could be simplified if |
based |
Please refer to PR #1525 instead of this one
Relevant issue: #1495:
_LlamaModel.metadata()
does not returntokenizer.ggml.tokens
This PR implements support for reading arrays from GGUF metadata from GGUFv2 and GGUFv3 files according to Georgi Gerganov's format spec for GGUF. I've tested it and it works with GGUFv2 and GGUFv3 models and all types of metadata.
The API is unchanged - just call the
_LlamaModel.metadata()
method the same as before.I also changed the way metadata is displayed when loading a model with
verbose=True
, because some arrays in metadata can be hundreds of thousands of items long (vocabulary, etc). So now each key and value is printed on its own line, and and value over 60 characters is truncated with...
.cc: @abetlen @CISC - I would appreciate any feedback :)
Example output when loading a model:
Click to expand full output