-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Argument out of range exception when running any prompt through DeepSeek-R1-Distill-Llama-8B-Q8_0 #1053
Comments
All the deepseek stuff was added within the past week to llama.cpp, ands my understanding is the llama.cpp bundled predates it. I tried with the 0.20 release myself just to see what would happen and I'm getting this which jives with the updates that I've seen on llama.cpp around this.
|
I second this. I loaded up the same model and received the "unknown pre-tokenizer type" error. I assume were just waiting for them to update it on their end. |
Yeah that looks like an issue with an outdated llama.cpp version. The 0.20 update required huge changes to the binary loading system, so by the time that was done and released we were already 3 weeks out of date! I'm already working on the next update :) |
I got it working on my end with the current version of LlamaSharp with the cuda12 backend with the base chat tutorial from the documentation, just changed the model name and it worked. I tested the Q2 and Q8 versions from Unsloth HuggingFace |
Description
https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF/blob/main/DeepSeek-R1-Distill-Llama-8B-Q8_0.gguf is the model being used.
I tried different sentences at it and what works flawlessly with regular llama models this one immediately throws an error about the template?
Stacktrace:
StackTrace " at System.ThrowHelper.ThrowArgumentOutOfRangeException()\r\n at System.MemoryExtensions.AsSpan[T](T[] array, Int32 start, Int32 length)\r\n at LLama.LLamaTemplate.Apply()\r\n at LLama.Transformers.PromptTemplateTransformer.ToModelPrompt(LLamaTemplate template)\r\n at LLama.Transformers.PromptTemplateTransformer.HistoryToText(ChatHistory history)\r\n at LLama.ChatSession.d43.MoveNext()\r\n at LLama.ChatSession.d43.System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult(Int16 token)\r\n at AISpeechChatApp.ChatModelServer.d__13.MoveNext() in ..... the rest is my name and computer info
code where breakpoint is triggered:
string output = "";
(breakpoint) await foreach (var text in session.ChatAsync(new ChatHistory.Message(AuthorRole.User, transcribedMessage), inferenceParameters))
{
Console.Write(text);
output += text;
}
Console.WriteLine();
AddToPrompt(false, output);
My model settings:
//initialize llm
var modelParameters = new ModelParams(modelPrePath + modelPath)
{
ContextSize = 8096,
GpuLayerCount = layercount // for 8b model
//GpuLayerCount = 18 //for 70b model
};
model = null;
model = LLamaWeights.LoadFromFile(modelParameters);
context = model.CreateContext(modelParameters);
executor = new InteractiveExecutor(context);
if (Directory.Exists("Assets/chathistory"))
{
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Loading session from disk.");
Console.ForegroundColor = ConsoleColor.White;
}
//initialize the inference parameters
inferenceParameters = new InferenceParams
{
SamplingPipeline = new DefaultSamplingPipeline
{
Temperature = 0.8f
},
};
//set system prompt
chatHistory = new ChatHistory();
chatHistory.AddMessage(AuthorRole.System, "You are Alex, an AI assistant tasked with helping the user with their project coded in C#. Answer any question they have and follow them through their ramblings about the project at hand.");
//set up session
session = new ChatSession(executor, chatHistory);
session.WithHistoryTransform(new PromptTemplateTransformer(model, withAssistant: true));
session.WithOutputTransform(new LLamaTransforms.KeywordTextOutputStreamTransform(
new string[] { model.Tokens.EndOfTurnToken!, "�" },
redundancyLength: 5));
The text was updated successfully, but these errors were encountered: