-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-codepoint emojis #39
Comments
I think wc/swidth should help somehow, yes. These didn’t exist in the first release of wcwidth.c this code is based upon, and since updating for new specs, I just failed to take parse them from the data files or otherwise tske them into account. This is a bug/missing feature, thanks! |
That's great, thanks. Hope this doesn't complicate things too much. I've been learning about how these emoji are encoded, and all I can say is yuck. You might know this already... there is a skin tone modifier which changes the skin tone of the preceding emoji and would have zero width. But it can also appear by itself and is rendered as a colored box if not preceded by an emoji taking up 2 cell widths (at least on iterm). That can be followed by a "zero width joiner" character which attaches another codepoint. In my first example that would be a wrench symbol, which makes the emoji a mechanic. All this was gleaned from https://emojipedia.org/ |
I began to draft some code for this purpose a bit ago, pushed branch https://github.com/jquast/wcwidth/tree/emoji-zwj I think the hardest parts are done (parsing unicode data files for emoji ZWJ),WIP |
@jquast any update on this WIP? I was going to see if I could move the ball forward, but when I try your branch, I get:
Looks like the file containing the table wasn't checked in. |
Try running tox, the tables are made by code generation, I think it is documented. I do hope to resume this issue in the next month or so, thanks for your interest |
bin/update-tables.py |
I just pulled wcwidth for the first time today when using tabulate in python. |
I think that wcswidth returning -1 for any non-printables/determinables have caused folks to rely on cheats, like sum(max(0, wcwdith(u)) for u in unicode-string), and the problem with that, is we wouldn’t be able to determine multi-code point emoji lengths, the -1 return value is probably not a good idea for Python, it’s simply an API compatible with all other wcswidth implementations. This WIP branch proposes a new API function, wcswidth.width that just does its best to return the width of a full string, no -1 return ability. If a control character like \n or \t is in there, we just ignore it, downstream libraries will have to do their own checks and measures for that. As a new function, we remain API compatable, but downstream libraries will want to use the new function for this feature, which I’ll probably also try to submit to the top 10 or so. |
I'm a little confused. Are you saying there is a fix for the issue I linked above or that this is still a WIP? |
Any updates here? A lot of downstream projects looking for a fix. |
Major ----- Bugfix zero-with characters, closes #57, #47, #45, #39, #26, #25, #24, #22, #8, wow ! This is mostly achieved by replacing `ZERO_WIDTH_CF` with dynamic parsing by Category codes in bin/update-tables.py and putting those in the zero-wide tables. Tests ----- - `verify-table-integrity.py` exercises a "bug" of duplicated tables that has no effect, because wcswidth() first checks for zero-width, and that is preferred in cases of conflict. This PR also resolves that error of duplication. - new automatic tests for balinese, kr jamo, zero-width emoji, devanagari, tamil, kannada. - added pytest-benchmark plugin, example use: # baseline tox -epy312 -- --verbose --benchmark-save=original # compare tox -epy312 -- --verbose --benchmark-compare=.benchmarks/Linux-CPython-3.12-64bit/0001_original.json
Fixed by #91 in today's release. I also wrote a tool to test terminals for Emoji ZWJ for anyone interested, https://pypi.org/project/ucs-detect/ |
Hi,
Can wcwidth help me with multi-codepoint emojis?
For instance, here I want to get the cell width for a "woman_mechanic_dark_skin_tone" emoji, which renders in the terminal as 2 cells, but wcswidth reports a width of 6 because it is adding up all the modifiers.
I've found support for these kind of emojis to be inconsistent across terminals, so maybe this is a lost cause, but is there some kind of standard for these emoji modifiers?
The text was updated successfully, but these errors were encountered: