What would wcwidth look like if it were built-in to Python? #201

jquast · 2023-10-21T17:05:27Z

jquast
Oct 21, 2023
Maintainer

Like P1868R2, "🦄 width: clarifying units of width and precision in std::format", Published Proposal, 2020-02-11 https://fmt.dev/papers/p1868.html

Why can't Python just do the right thing? For example, here it gets it wrong,

>>> print(f'|{"\u231a":x<5s}|\n'
...       f'|{"watch":x<5s}|\n')
|⌚xxxx|
|watch|

This emoji is measured as a width of 1, but it is actually a width of 2, causing rjust() to format it wrong. It also fails to account correctly when zero-width, ZWJ, and variation selectors are used. Python fails to get this measurement "right" for any kind of display device at all, but I think it goes without saying that the only purpose of this function is for monospace character displays such as terminals.

I believe the Built-in format string alignment functions, str.rjust, str.ljust, str.center, and textwrap.wrap should measure these unicode characters for their printable width, and not just the "number of codepoints".

The built-in REPL also gets this wrong in the readline-like library input. It becomes impossible to edit strings containing these characters, the cursor position and the result of input is unpredictable and disorienting.

IPython, which uses wcwidth, does a better job and should fare better with #91 closed, but it should not be required to use a large project like IPython as a REPL as a solution.

It would be good to experiment with the source code of Python, to see which parts of the codebase need changing. See #93 for the basic high-level functions

And, it would be better to draft and submit a PEP.

jquast · 2024-01-06T15:27:39Z

jquast
Jan 6, 2024
Maintainer Author

I have since found a few python bug reports, patches, and proposals for wcwidth in the standard library, linked below with a small number of choice quotes. The last issue (56777) got the closest, but shows a lot of disagreement about how to interpret the Unicode Specification, and, the fundamental problem of wrapping any OS-provided wcwidth(3) or wcswidth(3) would be inconsistent. Some people fundamentally misunderstand about fixed width vs. variable width fonts, and others the need for wcswidth() instead of, or in addition to wcwidth().

Anyway, this wcwidth library is now used in many applications, we have authored a clear specification and a terminal compliance assessment utility that was not previously available, and I think these offerings would push through any of the previously given contrary arguments.

There is no need to be perfectly correct for all terminals, but to be mostly correct for most languages in the most popular terminals is preferable!

python/cpython#56708

Some people agree,

Bad wrapping of CJK chars is a bug. I don't understand why Python2 should be broken forever!

CJK people are not subhumans, so don't support CJK is something called, wait... a bug ! And it's a shame that it was not fixed earlier.

And from, python/cpython#51004

Other functions I miss a lot are wcwidth() and wcswidth(). These functions return the real width (read, cells length in screen) for unicode strings. [..] I think Python could benefit from having these functions in the standard library.

Judging by your post your English probably is good enough to write a PEP [..] However, I doubt a PEP would be necessary.

And python/cpython#56777

Can't we expose wcswidth() as locale.strwidth() with a recipe explaining how to use unicodedata to get a "correct" result? At least until everyone implements correctly Unicode and Unicode stops evolving? :-)

I think this function would be very useful in many parts of interpreter core and standard library. From displaying tracebacks to formatting helps. Otherwise we are doomed to implement imperfect variants in multiple places.

Since we failed to agree on this feature, I close the issue.
I close the issue as WONTFIX.

0 replies

grayjk · 2025-12-21T14:56:28Z

grayjk
Dec 21, 2025

@jquast There's some new interest in python/cpython#56777 and a discussion started: https://discuss.python.org/t/text-segmentation-api-design/105371

Also see python/cpython#143076

0 replies

jquast · 2026-01-10T20:19:18Z

jquast
Jan 10, 2026
Maintainer Author

This is pretty great, thanks for comment about it. I'll close this issue.

It looks pretty much like I had imagined it, and that's great and I really hope it succeeds. I only regret that I cannot afford more time to help, but I will write about my support of the need to have this built-in to python.

As @serhiy-storchaka wrote here,

I think that even imperfect solution is better than no solution

I couldn't agree more. We can recognize that the solution will always be imperfect, for this library, we've mostly whittled it down to a small few languages that are otherwise just too difficult to display in monospace format legibly on any terminal, so whether their measurement is correct is debatable. We should probably seek out other solutions for those languages, as written under heading "Beyond Fixed Widths" in https://www.jeffquast.com/post/state-of-terminal-emulation-2025/

As made evident by the ucs-detect tool results, we cannot make correct measurements for all terminals and all languages because there is not consensus of unicode support among terminals. Not by unicode release version, feature (ZWJ, VS-16, etc), language, or Dec Mode 2027 support.

We only just do our best to match compatibility with those terminals with active unicode support (foot, ghostty, kitty) and ignore reports of mismatches for those that do not (gnome-terminal, PuTTY).

I only regret that it can bring negative attention and arguments to the python bug issue tracker, Like that Variation Selector-16 is not supported by a majority of terminal emulators and even vehemently argued against by Gnome Terminal (now in reconsideration), over some of the most common and original first-released emojis like "❤️", and, having been declined for support in glibc, it can be expected to cause issues that can become argumentative.

0 replies

jquast · 2026-01-13T18:50:51Z

jquast
Jan 13, 2026
Maintainer Author

After reviewing the current progress I've decided to keep this issue "open" because so far even grapheme cluster boundry support is not yet accepted into standard python.

I'm going to implement a pure-python version of iter_graphemes() for our own purposes #165, to aide in resolution of open issues, #93 especially.

0 replies

jquast · 2026-01-15T18:52:11Z

jquast
Jan 15, 2026
Maintainer Author

This issue encouraged me to better propose what I think "wcwidth should look in standard python" as PR's #165, #166, and #168, #169, to implement width, iter_graphemes, iter_sequences, ljust, rjust, and center, and wrap, derived from blessed.

I believe this implementation of control_codes='parse' (default), 'strict', and 'ignore' is the most balanced, that sequence processing has minimal overhead but significantly helps TUI/CLI end-users when using the default option, but, can be used more carefully ('strict') or very quickly with trusted input ('ignore').

I hope that this project becomes deprecated by any level of similar grapheme and sequence-awareness support in the standard python language, and I really encourage it for language accessibility and performance.

0 replies

jquast · 2026-01-17T19:45:09Z

jquast
Jan 17, 2026
Maintainer Author

All of the PR's mentioned are merged for next released (0.3.0).

0 replies

What would wcwidth look like if it were built-in to Python? #201

Uh oh!

Uh oh!

jquast Oct 21, 2023 Maintainer

Replies: 6 comments

Uh oh!

Uh oh!

jquast Jan 6, 2024 Maintainer Author

Uh oh!

Uh oh!

grayjk Dec 21, 2025

Uh oh!

Uh oh!

jquast Jan 10, 2026 Maintainer Author

Uh oh!

Uh oh!

jquast Jan 13, 2026 Maintainer Author

Uh oh!

Uh oh!

jquast Jan 15, 2026 Maintainer Author

Uh oh!

jquast Jan 17, 2026 Maintainer Author

jquast
Oct 21, 2023
Maintainer

jquast
Jan 6, 2024
Maintainer Author

grayjk
Dec 21, 2025

jquast
Jan 10, 2026
Maintainer Author

jquast
Jan 13, 2026
Maintainer Author

jquast
Jan 15, 2026
Maintainer Author

jquast
Jan 17, 2026
Maintainer Author