Skip to content

Conversation

@thammegowda
Copy link

@thammegowda thammegowda commented Nov 21, 2025

Adding in bindings for two more languages!

  • bindings/cpp
  • bindings/c

C is an intermediate step to bind C++ and Rust: i.e., C++ <--> C <--> Rust.

--

  • Added tests to c++
  • Added benchmarks for my sanity checks and the results are as expected.

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Collaborator

@ArthurZucker ArthurZucker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very open to this! let's make sure we have a big compat with expectations in terms of the funcs we bind

@@ -0,0 +1,133 @@
use std::ffi::{CStr, CString};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok ffi perfect

}

#[no_mangle]
pub extern "C" fn tokenizers_encode(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think we need decode, batch encode batch encode fast etc and asyc? I don't know how best to define these all but basically the same surface as python!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the quick review! Happy to know you're interested/open to this effort.
I will get back after surfacing more functionality to C++.

@thammegowda thammegowda marked this pull request as draft November 22, 2025 18:51
CPP bindings coverage improvement
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants