Replies: 6 comments
-
|
I have since found a few python bug reports, patches, and proposals for wcwidth in the standard library, linked below with a small number of choice quotes. The last issue (56777) got the closest, but shows a lot of disagreement about how to interpret the Unicode Specification, and, the fundamental problem of wrapping any OS-provided wcwidth(3) or wcswidth(3) would be inconsistent. Some people fundamentally misunderstand about fixed width vs. variable width fonts, and others the need for wcswidth() instead of, or in addition to wcwidth(). Anyway, this wcwidth library is now used in many applications, we have authored a clear specification and a terminal compliance assessment utility that was not previously available, and I think these offerings would push through any of the previously given contrary arguments. There is no need to be perfectly correct for all terminals, but to be mostly correct for most languages in the most popular terminals is preferable! Some people agree,
And from, python/cpython#51004
|
Beta Was this translation helpful? Give feedback.
-
|
@jquast There's some new interest in python/cpython#56777 and a discussion started: https://discuss.python.org/t/text-segmentation-api-design/105371 Also see python/cpython#143076 |
Beta Was this translation helpful? Give feedback.
-
|
This is pretty great, thanks for comment about it. I'll close this issue. It looks pretty much like I had imagined it, and that's great and I really hope it succeeds. I only regret that I cannot afford more time to help, but I will write about my support of the need to have this built-in to python. As @serhiy-storchaka wrote here,
I couldn't agree more. We can recognize that the solution will always be imperfect, for this library, we've mostly whittled it down to a small few languages that are otherwise just too difficult to display in monospace format legibly on any terminal, so whether their measurement is correct is debatable. We should probably seek out other solutions for those languages, as written under heading "Beyond Fixed Widths" in https://www.jeffquast.com/post/state-of-terminal-emulation-2025/ As made evident by the ucs-detect tool results, we cannot make correct measurements for all terminals and all languages because there is not consensus of unicode support among terminals. Not by unicode release version, feature (ZWJ, VS-16, etc), language, or Dec Mode 2027 support. We only just do our best to match compatibility with those terminals with active unicode support (foot, ghostty, kitty) and ignore reports of mismatches for those that do not (gnome-terminal, PuTTY). I only regret that it can bring negative attention and arguments to the python bug issue tracker, Like that Variation Selector-16 is not supported by a majority of terminal emulators and even vehemently argued against by Gnome Terminal (now in reconsideration), over some of the most common and original first-released emojis like "❤️", and, having been declined for support in glibc, it can be expected to cause issues that can become argumentative. |
Beta Was this translation helpful? Give feedback.
-
|
After reviewing the current progress I've decided to keep this issue "open" because so far even grapheme cluster boundry support is not yet accepted into standard python. I'm going to implement a pure-python version of iter_graphemes() for our own purposes #165, to aide in resolution of open issues, #93 especially. |
Beta Was this translation helpful? Give feedback.
-
|
This issue encouraged me to better propose what I think "wcwidth should look in standard python" as PR's #165, #166, and #168, #169, to implement width, iter_graphemes, iter_sequences, ljust, rjust, and center, and wrap, derived from blessed. I believe this implementation of I hope that this project becomes deprecated by any level of similar grapheme and sequence-awareness support in the standard python language, and I really encourage it for language accessibility and performance. |
Beta Was this translation helpful? Give feedback.
-
|
All of the PR's mentioned are merged for next released (0.3.0). |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Like P1868R2, "🦄 width: clarifying units of width and precision in std::format", Published Proposal, 2020-02-11 https://fmt.dev/papers/p1868.html
Why can't Python just do the right thing? For example, here it gets it wrong,
This emoji is measured as a width of 1, but it is actually a width of 2, causing rjust() to format it wrong. It also fails to account correctly when zero-width, ZWJ, and variation selectors are used. Python fails to get this measurement "right" for any kind of display device at all, but I think it goes without saying that the only purpose of this function is for monospace character displays such as terminals.
I believe the Built-in format string alignment functions, str.rjust, str.ljust, str.center, and textwrap.wrap should measure these unicode characters for their printable width, and not just the "number of codepoints".
The built-in REPL also gets this wrong in the readline-like library input. It becomes impossible to edit strings containing these characters, the cursor position and the result of input is unpredictable and disorienting.
IPython, which uses wcwidth, does a better job and should fare better with #91 closed, but it should not be required to use a large project like IPython as a REPL as a solution.
It would be good to experiment with the source code of Python, to see which parts of the codebase need changing. See #93 for the basic high-level functions
And, it would be better to draft and submit a PEP.
Beta Was this translation helpful? Give feedback.
All reactions