Skip to content

anysphere/tiktoken-rs

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

⏳ tiktoken-rs

tiktoken-rs is based on openai/tiktoken, rewritten to work as a Rust crate. It is unstable, experimental, and only half-implemented at the moment, but usable enough to count tokens in some cases.

let enc = tiktoken::EncodingFactory::cl100k_base().unwrap();
let tokens = enc.encode(
    "hello world",
    &SpecialTokenHandling {
        default: SpecialTokenAction::Forbidden,
        ..Default::default()
    }
).unwrap()
println!("Number of tokens: {}", tokens.len());

Which tokenizer to use?

GPT-3 (text-davinci-002 and earlier) is r50k_base.

Codex and GPT-3.5 (code-davinci-002 and text-davinci-003) is p50k_base.

Embeddings (text-embedding-ada-002) is cl100k_base.

About

A pure-rust port of tiktoken.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Rust 76.3%
  • Python 23.7%