Description
Description
https://huggingface.co/unsloth/DeepSeek-R1-Distill-Llama-8B-GGUF/blob/main/DeepSeek-R1-Distill-Llama-8B-Q8_0.gguf is the model being used.
I tried different sentences at it and what works flawlessly with regular llama models this one immediately throws an error about the template?
Stacktrace:
StackTrace " at System.ThrowHelper.ThrowArgumentOutOfRangeException()\r\n at System.MemoryExtensions.AsSpan[T](T[] array, Int32 start, Int32 length)\r\n at LLama.LLamaTemplate.Apply()\r\n at LLama.Transformers.PromptTemplateTransformer.ToModelPrompt(LLamaTemplate template)\r\n at LLama.Transformers.PromptTemplateTransformer.HistoryToText(ChatHistory history)\r\n at LLama.ChatSession.d43.MoveNext()\r\n at LLama.ChatSession.d43.System.Threading.Tasks.Sources.IValueTaskSource<System.Boolean>.GetResult(Int16 token)\r\n at AISpeechChatApp.ChatModelServer.d__13.MoveNext() in ..... the rest is my name and computer info
code where breakpoint is triggered:
string output = "";
(breakpoint) await foreach (var text in session.ChatAsync(new ChatHistory.Message(AuthorRole.User, transcribedMessage), inferenceParameters))
{
Console.Write(text);
output += text;
}
Console.WriteLine();
AddToPrompt(false, output);
My model settings:
//initialize llm
var modelParameters = new ModelParams(modelPrePath + modelPath)
{
ContextSize = 8096,
GpuLayerCount = layercount // for 8b model
//GpuLayerCount = 18 //for 70b model
};
model = null;
model = LLamaWeights.LoadFromFile(modelParameters);
context = model.CreateContext(modelParameters);
executor = new InteractiveExecutor(context);
if (Directory.Exists("Assets/chathistory"))
{
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("Loading session from disk.");
Console.ForegroundColor = ConsoleColor.White;
session = new ChatSession(executor);
}
//initialize the inference parameters
inferenceParameters = new InferenceParams
{
SamplingPipeline = new DefaultSamplingPipeline
{
Temperature = 0.8f
},
MaxTokens = -1, // keep generating tokens until the anti prompt is encountered
AntiPrompts = new List<string> { model.Tokens.EndOfTurnToken!, "<|im_end|>" }, // Stop generation once antiprompts appear.
};
//set system prompt
chatHistory = new ChatHistory();
chatHistory.AddMessage(AuthorRole.System, "You are Alex, an AI assistant tasked with helping the user with their project coded in C#. Answer any question they have and follow them through their ramblings about the project at hand.");
//set up session
session = new ChatSession(executor, chatHistory);
session.WithHistoryTransform(new PromptTemplateTransformer(model, withAssistant: true));
session.WithOutputTransform(new LLamaTransforms.KeywordTextOutputStreamTransform(
new string[] { model.Tokens.EndOfTurnToken!, "�" },
redundancyLength: 5));
Activity
phil-scott-78 commentedon Jan 21, 2025
All the deepseek stuff was added within the past week to llama.cpp, ands my understanding is the llama.cpp bundled predates it. I tried with the 0.20 release myself just to see what would happen and I'm getting this which jives with the updates that I've seen on llama.cpp around this.
AgentSmithers commentedon Jan 24, 2025
I second this. I loaded up the same model and received the "unknown pre-tokenizer type" error. I assume were just waiting for them to update it on their end.
martindevans commentedon Jan 24, 2025
Yeah that looks like an issue with an outdated llama.cpp version. The 0.20 update required huge changes to the binary loading system, so by the time that was done and released we were already 3 weeks out of date! I'm already working on the next update :)
vltmedia commentedon Jan 31, 2025
I got it working on my end with the current version of LlamaSharp with the cuda12 backend with the base chat tutorial from the documentation, just changed the model name and it worked.
Check my implementation here.
I tested the Q2 and Q8 versions from Unsloth HuggingFace
weirdyang commentedon Mar 4, 2025
Does anyone know what is the replacement for this?
model.Tokens.EndOfTurnToken
I can't seem to find this property on theLlamaWeights
classmartindevans commentedon Mar 4, 2025
If you're trying to use it in the antiprompts it shouldn't be needed any more - the executors internally check if they're about to return the
EndOfTurnToken
and if so they stop inference.weirdyang commentedon Mar 4, 2025
@martindevans I see, thanks. I am occasionally experiencing this issue where the llm continues without stopping, even with the
EndOfTurnToken
present, I had to manually add it to the anti-prompts.Here's an example:
#1121 (comment)
github-actions commentedon May 12, 2025
This issue has been automatically marked as stale due to inactivity. If no further activity occurs, it will be closed in 7 days.