Emoji: How to measure Non-Recommended Emoji sequences? #203

janlelis · 2024-11-13T11:48:57Z

janlelis
Nov 13, 2024

Hi Jeff,

imho, one of the many problems of getting Emoji widths right is that there are multiple definitions of what an Emoji sequence can look like to be considered an Emoji. The standard defines an recommended set here:

https://www.unicode.org/reports/tr51/#def_rgi_set

but it also defines variations of it (in emoji-test.txt, listed here) and more importantly, allows arbitrary Emoji sequences which are still considered valid.

However, probability is low that non-recommended Emoji will ever gain so much popularity that terminals would need to display them as a single width-2-Emoji, so there will always be the case, that some Emoji sequences should be displayed as an (actual) sequence of separate basic Emoji. That is the reason I made Emoji handling configurable in the latest release of unicode-display_width.

I did some manual testing and was surprised to learn that most terminals (gnome-terminal, vscode terminal etc.) would not display most Emoji sequences (the exception being popular terminals on macOS). I also noted that is very unpleasant to work with software that measures RGI-Emoji sequences correctly, but the terminal displays them as separate Emoji. I am trying to automatically detect the terminal used to provide a good out-of-box experience, but probably, a run-time check (like ucs-detect does) would be more reliable.

Condensing the above remarks into a single question/issue, it would be:

Should wcwidth() / the wcwidth spec support different string width mechanisms for RGI and non-RGI Emoji?

jquast · 2024-11-13T15:37:10Z

jquast
Nov 13, 2024
Maintainer

I totally understand and agree, the results of my ucs-detect library so far (and preliminary results from a more detailed WIP branch) show terminal support varies more wildly than expected.

I expected each terminal to mostly match a specific unicode release, and implemented UNICODE_VERSION for that. But there are countless examples where Emoji, Zero-width, and combining character tables of a terminal are incomplete and the unicode versions of each table are different!

I have proposed a complex solution in another issue, here, #123 (comment)

Maybe there could be an environment variable that ucs-detect could export that wcwidth could make use of to more accurately determine any widths outside of specification of a given terminal

And the very bottom of this issue, #104 (comment)

However, with tools like 'ucs-detect', we can very programmatically determine with black-box testing, which wide, zero-width, and whether ZWJ and VS-16 are supported, right down to exactly which ones. By making this a delta of expected terminal support, and using ranges with codepoints, maybe it is possible to describe with a complex environment variable.

Just spitballing an idea of what it might look like,

UNICODE_SUPPORT="zero[8.0:!category:Mc,Mn,!1001-1002,!1003],wide[15.1:!zwj,!vs16,!9009-9010]"

It's pretty advanced, however, this would allow us to precisely predict or measure the width of all possible sequences on each individual terminal with more perfect accuracy.

0 replies

jquast · 2026-01-12T19:33:22Z

jquast
Jan 12, 2026
Maintainer

Just to come back to say the results of testing terminals for RGI (Recommened for General Interchange) is at https://ucs-detect.readthedocs.io/results.html -- less than half of terminals support RGI emojis.

It is necessary to interact with a terminal using CPR (Cursor Position Report) to determine their support for RGI / Emoji with ZWJ, in the way that the ucs-detect tool does.

0 replies

jquast · 2026-01-29T06:23:02Z

jquast
Jan 29, 2026
Maintainer

pretty much finished, pretty sure i found a related glitch in ghostty that distracted me for a while.

$ python -c "print('|\U0001f333\u200d\u2744\ufe0f|X\n|    |X\npipes should align')"
|🌳‍❄|X
|    |X

in ghostty, the pipe meant to follow snowflake is following the tree instead, and the snowflake appears.. 10 or so cells offset from where it belongs, incredible glitch in HEAD of their main branch

0 replies

jquast · 2026-01-29T18:36:20Z

jquast
Jan 29, 2026
Maintainer

I have experimented with, at least, validate now to follow ASCII/CJK with ZWJ and an Emoji, and I decide even against that.

My final decision is that non-recommended Emoji Sequences, like TREE+ZWJ+SNOWFLAKE, will measure as width 2 in python wcswidth. Even though they will likely display as two non-joined emojis side-by-side on most TE's like Konsole.

But I consider this in the area of "glitch emoji" or undefined behavior.

If wcwidth were never to make another release, but a new ZWJ emoji becomes popular in years future the current implementation is forward-compatible and requires no change, so that's a nice bonus.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Emoji: How to measure Non-Recommended Emoji sequences? #203

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 4 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Emoji: How to measure Non-Recommended Emoji sequences? #203

Uh oh!

Uh oh!

janlelis Nov 13, 2024

Replies: 4 comments

Uh oh!

Uh oh!

jquast Nov 13, 2024 Maintainer

Uh oh!

jquast Jan 12, 2026 Maintainer

Uh oh!

Uh oh!

jquast Jan 29, 2026 Maintainer

Uh oh!

jquast Jan 29, 2026 Maintainer

janlelis
Nov 13, 2024

jquast
Nov 13, 2024
Maintainer

jquast
Jan 12, 2026
Maintainer

jquast
Jan 29, 2026
Maintainer

jquast
Jan 29, 2026
Maintainer