Display errors with emoji of various widths #758

lxcode · 2020-10-28T18:13:12Z

Small description

I apologize for even wading into the waters of unicode and terminal character width, but I haven't found a workaround that will sanitize text appropriately. When looking at data that contains smatterings of emoji and other unicode oddities, the glitches in character width cause the column separators to jump around and get out of alignment. This is particularly apparent with various flag emoji (particularly severe when many of them are used in sequence). Is there anything that visidata could do to mitigate this, perhaps using jquast/wcwidth?

Expected result

Everything displays cleanly and columns line up.

Actual result with screenshot

Additional context
v 2.0.1
Tested with both kitty and iTerm2 on macOS, local machine (no tmux or anything).

saulpw · 2020-10-28T18:18:20Z

Try setting disp_ambig_width to 2 (it defaults to 1) and let me know if that helps.

lxcode · 2020-10-28T18:26:09Z

Unfortunately, that doesn't appear to make a difference. :-/ I've attached a tsv (had to rename it to txt to upload) that should work for repro.

chartest.txt

saulpw · 2020-10-29T00:50:27Z

Thanks for the test data. I use urxvt on Linux, and I also tried it with lxterminal:

My font doesn't include country flags apparently, but you can see that a) the non-emoji full-width characters display properly, and b) the alignment of the columns is correct. I'd love to get mac terminals to do the right thing, especially (b), but I'm not sure it's possible. I'm guessing this has to do with combining characters (a flag is actually two characters which result in the country flag icon when combined), which VisiData is allocating 2 full-width spaces for, but only takes up one space when drawn.

ajkerrigan · 2020-10-29T02:29:42Z

This is an interesting issue - I've had similar trouble but oddly not on my Mac. When I use VisiData locally (generally inside Alacritty+tmux) everything is fine. If I record and play back an asciicast everything still looks fine. But when I upload a cast to asciinema.org, the column separators go out of alignment for rows with unicode values (sample).

I suspected I'd be able to work around this with some combination of font/encoding/terminal tweaks, but haven't found the right magic mix yet. I also tried messing with the disp_ambig_width option thanks to this thread. I'm not sure if this is the same core issue or a subtly different one, but it seems worth noting in case it's a useful data point.

lxcode · 2020-10-29T02:45:06Z

I just tested on Linux in kitty, and I do get misalignments when I expand rows to full width (row 3). gnome-terminal keeps the alignment right, but mangles the emoji pretty badly.

My guess is this is primarily going to be an issue with flags and things like 👪 that are composite emoji of skin tone/gender.

lxcode · 2020-11-24T20:40:32Z

Digging a bit further into this: it looks like dispwidth() can be replaced with calls to wcwidth/wcswidth (in cliptext.py and column.py), and this is probably a more future-proof and comprehensive way to deal with unicode string lengths. General calls to len() for strings like trunch and sepchars may as well use wcwidth as well.

Unfortunately, replacing those doesn't seem to fix the issue in kitty, iTerm or Terminal. I'm assuming the answer is somewhere in drawRow, but I'm not seeing it.

saulpw · 2020-12-02T04:11:44Z

Maybe there's a way to work around it by directly placing the column separators instead of appending them to the value string. I thought it was already doing this, but apparently not. The wcwidth library looks great (and I have been frustrated myself by the same problem, as evidenced by dispwidth), but for the time being I'd rather not take on another dependency if it doesn't fix a problem.

saulpw · 2021-06-26T01:26:30Z

The primary issue here is with "❤️" which is actually U+2764 ("❤") followed by U+FE0F (VARIATION SELECTOR-16). The base heart is width 1, and U+FE0F is width 0 (it's categorized as a "nonspacing mark"). But the combined emoji is width 2--and there's no way to detect this from the characters themselves. The wcwidth library doesn't handle this either.

polm · 2021-06-26T04:58:57Z

It seems that wcwidth doesn't handle this yet but they're working on it in jquast/wcwidth#39.

saulpw · 2021-06-26T08:27:37Z

Thanks @polm, good find. I noticed in my above comment that Github rendered both hearts identically (with images! Unicode be damned), which got me to thinking that we can filter out these tricky combining/variant characters and replace them with indicators like ◌ for display purposes. So I implemented that and it improves things:

By default the behavior will stay the same, but with options.visibility set to 1 you can see the deconstructed form which has a deterministic layout. (This option can't be changed within vd at the moment for some reason; may be a caching issue.)

In any event, I think this is the best we can do, especially since different terminals render combinations differently. Hope this helps a bit with complicated unicode data. Thanks again for filing, @lxcode.

Oh and we also have a new sample_data/test-unicode-display.tsv which has minimum tests for some of these tricky cases (this is what's shown in the images above). Please submit other test cases that don't work (it's good to collect them all, even if we can't fix them all).

- add sample_data/test-unicode-display.tsv - filter/replace combining and variant chars if options.visibility non-zero

lxcode added the bug label Oct 28, 2020

saulpw closed this as completed Jun 26, 2021

saulpw added a commit that referenced this issue Jun 26, 2021

[cliptext] better support for combining and variant chars #758

6a14a4c

- add sample_data/test-unicode-display.tsv - filter/replace combining and variant chars if options.visibility non-zero

geekscrapy mentioned this issue Jun 28, 2021

^ symbol is rendered as ◦ on OSX #1034

Closed

saulpw added a commit that referenced this issue Jun 29, 2021

[cliptext-] do not replace printable ascii with modchar #1034 #758'

d993c6a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Display errors with emoji of various widths #758

Display errors with emoji of various widths #758

lxcode commented Oct 28, 2020 •

edited

Loading

saulpw commented Oct 28, 2020

lxcode commented Oct 28, 2020

saulpw commented Oct 29, 2020

ajkerrigan commented Oct 29, 2020

lxcode commented Oct 29, 2020

lxcode commented Nov 24, 2020 •

edited

Loading

saulpw commented Dec 2, 2020

saulpw commented Jun 26, 2021

polm commented Jun 26, 2021

saulpw commented Jun 26, 2021 •

edited

Loading

Display errors with emoji of various widths #758

Display errors with emoji of various widths #758

Comments

lxcode commented Oct 28, 2020 • edited Loading

saulpw commented Oct 28, 2020

lxcode commented Oct 28, 2020

saulpw commented Oct 29, 2020

ajkerrigan commented Oct 29, 2020

lxcode commented Oct 29, 2020

lxcode commented Nov 24, 2020 • edited Loading

saulpw commented Dec 2, 2020

saulpw commented Jun 26, 2021

polm commented Jun 26, 2021

saulpw commented Jun 26, 2021 • edited Loading

lxcode commented Oct 28, 2020 •

edited

Loading

lxcode commented Nov 24, 2020 •

edited

Loading

saulpw commented Jun 26, 2021 •

edited

Loading