Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix indexing when looking into words within a margin
This comes up in two different branches - we want to split up a word so into two segments based on a minimum margin of characters at the beginning and end of the word. For example we might split "10th" as "1" and "0th" based on a margin of 1, and also "10" and "th" and "10t" "h". There is surprising behavior (in `&"áoóáoó"[5..=5]` for example, which panics) that an inclusive range on the right-hand side is equivalent to an exclusive range on the right side with the `end` being `+ 1`. That isn't true for byte indexes within a UTF-8 encoded string: it might be within a multi-byte codepoint. So we need to switch to a regular Rust range (exclusive on the right). This means using the word's `str::len` when the margin is `1` and the `nth - 2` codepoint's byte index otherwise. This commit refactors this all as a helper function (which seems very complex actually) which covers adjusting the indices as well, reducing some boilerplate between the callers. (It ends up being more code in absolute terms though - I wonder if just returning the range is wiser, or eliminating the shared function.)
- Loading branch information