You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for the great library! It's a lot more robust and easier to work with than some alternatives.
Sequence.length() uses jquast/wcwidth internally. Unfortunately, it is not accurate for all Unicode characters. These include LRI (U+2066) and PDI (U+2069). For both, wcwidth returns 1 when these characters have length zero as they are not printed in the terminal (I'm using GNOME terminal). This corresponds to jquast/wcwidth#26. A possibility to fix this would be to replace wcwidth with cwcwidth, which is used by curtsies (and bpython) and as a bonus has a much faster implementation.
Context
It's possible some terminals show these as 1 width but that would be incorrect behavior, as LRI and PDI are supposed to simply affect directionality (for LTR, RTL scripts) and not be displayed. For example if you want to display individual Hebrew characters not with actual meaning, but as a binary decoding (which is my strange use case), you want to print each character as 'א' (there's a LDI and PDI on the left and right side of the character, respectively), so if you combine multiple your string will be displayed in memory order, e.g. 'אל'. If you would print it normally, you'd get 'אל'. As you can see also in this text, they are invisible.
Of course, some editors do display these characters (e.g. IntelliJ) as they can be sneaked in to alter source code (see for example the security issue that prompted the recent 1.56.1 Rust release) and people who view code in the terminal might want some special characters to reveal the presence of those characters as well. But that should not be the default, as in actual display strings the characters should be invisible.
The text was updated successfully, but these errors were encountered:
I see cwcwidth uses category 'Cf' in the zero width table and wcwidth does not, that is the problem, I will try to address it in the coming weeks in wcwidth thanks
Thanks for the great library! It's a lot more robust and easier to work with than some alternatives.
Sequence.length() uses jquast/wcwidth internally. Unfortunately, it is not accurate for all Unicode characters. These include LRI (U+2066) and PDI (U+2069). For both, wcwidth returns 1 when these characters have length zero as they are not printed in the terminal (I'm using GNOME terminal). This corresponds to jquast/wcwidth#26. A possibility to fix this would be to replace wcwidth with cwcwidth, which is used by curtsies (and bpython) and as a bonus has a much faster implementation.
Context
It's possible some terminals show these as 1 width but that would be incorrect behavior, as LRI and PDI are supposed to simply affect directionality (for LTR, RTL scripts) and not be displayed. For example if you want to display individual Hebrew characters not with actual meaning, but as a binary decoding (which is my strange use case), you want to print each character as 'א' (there's a LDI and PDI on the left and right side of the character, respectively), so if you combine multiple your string will be displayed in memory order, e.g. 'אל'. If you would print it normally, you'd get 'אל'. As you can see also in this text, they are invisible.
Of course, some editors do display these characters (e.g. IntelliJ) as they can be sneaked in to alter source code (see for example the security issue that prompted the recent 1.56.1 Rust release) and people who view code in the terminal might want some special characters to reveal the presence of those characters as well. But that should not be the default, as in actual display strings the characters should be invisible.
The text was updated successfully, but these errors were encountered: