Skip to content

Allow chunking on double-graphemes #730

@jordanxlau

Description

@jordanxlau

Description & Motivation

We would like to allow chunking on double graphemes, such as ?!

Pitch

Currently, the chunk_text algorithm in textsplit.py assumes that we will be splitting on a single grapheme (! or. for instance). We would like to remove the assumption that this is the case.

Alternatives

No response

Additional context

To do this, textsplit.py and text_config.py would need to be refactored to accept strong and weak boundaries as lists. They are currently stored as strings.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions