diff --git a/bindings/python/CHANGELOG.md b/bindings/python/CHANGELOG.md index 344e8df87..ea8623928 100644 --- a/bindings/python/CHANGELOG.md +++ b/bindings/python/CHANGELOG.md @@ -1,6 +1,9 @@ # v0.6.0 (not published yet) -Fixes: +## Changes: +- Big improvements in speed for BPE (Both training and tokenization) ([#165](https://github.com/huggingface/tokenizers/pull/165)) + +## Fixes: - Some default tokens were missing from `BertWordPieceTokenizer` (cf [#160](https://github.com/huggingface/tokenizers/issues/160)) - There was a bug in ByteLevel PreTokenizer that caused offsets to be wrong if a char got split up in multiple bytes. (cf [#156](https://github.com/huggingface/tokenizers/pull/156)) diff --git a/tokenizers/CHANGELOG.md b/tokenizers/CHANGELOG.md index b0f29a848..31469f36f 100644 --- a/tokenizers/CHANGELOG.md +++ b/tokenizers/CHANGELOG.md @@ -1,5 +1,8 @@ # v0.8.0 (not released yet) +## Changes: +- Big improvements in speed for BPE (Both training and tokenization) ([#165](https://github.com/huggingface/tokenizers/pull/165)) + ## Fixes: - Do not open all files directly while training ([#163](https://github.com/huggingface/tokenizers/issues/163)) - There was a bug in ByteLevel PreTokenizer that caused offsets to be wrong if a char got split up