-
Notifications
You must be signed in to change notification settings - Fork 1k
DOCS: add add_prefix_space to processors.ByteLevel
#1878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOCS: add add_prefix_space to processors.ByteLevel
#1878
Conversation
add_prefix_space to processors.ByteLevel
ArthurZucker
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM but lets update to proper doc
bindings/python/src/processors.rs
Outdated
| /// | ||
| /// add_prefix_space (:obj:`bool`, `optional`, defaults to :obj:`True`): | ||
| /// Whether the add_prefix_space option was enabled during pre-tokenization. This |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the doc is wrong in the sense that this does not remove the space in the post processor nor does it add it. It just accounts for it in the offset computation if trim_offsets.
// If we are processing the first pair of offsets, with `add_prefix_space`,
// then we shouldn't remove anything we added. If there are more than one
// leading spaces though, it means we didn't add them, and they should be
// removed.try to incorporate this please!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review, I updated the docs. I somehow misunderstood the meaning of add_prefix_space for the postprocessor, but it simply refers to the preprocessor and has more of a technical meaning here. To actually understand the meaning I added some code to the tests and thought that might be helpful to keep. Let me know if you want further changes.
|
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
|
just missing make style |
…okenizers into docs-add-prefix-space-1819
done |
😉 |
|
That was weird, nothing happened when I ran |
|
you need to cargo build or |
|
Ty! |
closes #1819
We already support
add_prefix_spaceforprocessors.ByteLevel, it is just not documented since this class is defined via pyo3'spyclass, see here. See further down the signature for thenewfunction, which mirrors python's__new__method: https://github.com/huggingface/tokenizers/blob/main/bindings/python/src/processors.rs#L492.