-
Notifications
You must be signed in to change notification settings - Fork 2
Open
Labels
enhancementNew feature or requestNew feature or request
Milestone
Description
Description & Motivation
We would like to allow chunking on double graphemes, such as ?!
Pitch
Currently, the chunk_text algorithm in textsplit.py assumes that we will be splitting on a single grapheme (! or. for instance). We would like to remove the assumption that this is the case.
Alternatives
No response
Additional context
To do this, textsplit.py and text_config.py would need to be refactored to accept strong and weak boundaries as lists. They are currently stored as strings.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or request