-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should wcwidth have "Treat ambiguos-width as wide" option? #123
Comments
Maybe wcwidth only focus on terminal font? I'm using PrettyTable to generate table, which depends on wcwidth, and I want to display the table text on browser. For example, I'm using chrome, and I found monospace fonts works fine most of the time. But for some unicode words, it displays with a different length. |
It will be helpful if I can provide the font family and get a more general result. Is it possible? |
wcwidth is primarily focused for terminals, that is if browsers and terminals disagree we would rather match with terminals. Although I expect a javascript or browser-based library that is more focused on browser width, I cannot find one at this moment, please suggest if you do. Browsers are able to communicate directly with the font engine of the operating system, while wcwidth in python and other languages are not, so we generally take a more naive approach. And this is probably why most terminals are also wrong in this case while browsers are not. In this case, the problem with ① (https://codepoints.net/U+2460) is that it is Ambiguous width (https://unicode.org/reports/tr11/#Ambiguous) and,
In the following code blocks I use the same character, one with english letters on the same line,
and another of your example with your Mandarin Chinese "hello",
Although they render differently sized, at least on my browser (Firefox 120.0.1), they have approximately the same width. I will say that monospace fonts do not always align vertically in browsers (note how the number '5' does not align in the first example), while they always do in terminals. Screenshot of the above, It would require more experimentation, but maybe for a page of Chinese locale it would render differently, such as in your original screenshot, I'm not really sure. In any case, there are options on many terminals, to cause ambiguous width characters to display as 2 cells,
I'm not certain, but maybe this option is more frequently used for east-asian language users in terminals? But it is very problematic -- the entire software stack needs to agree to "treat ambiguous width as wide", for example, here is an "$LD_PRELOAD-able library and a wrapper script" that patches posix wcwidth for this option, and references many issues and bugs about this option. https://github.com/fumiyas/wcwidth-cjk The "Terminal Working Group" tried to come to a consensus about this and other issues, https://gitlab.freedesktop.org/terminal-wg/specifications/-/issues/9#note_406682 -- there was a great deal of discussion but this "Working Group" specifications project has failed to come to any consensus at all on any single issue (the "accepted" folder is empty, 31 open issues) And, maybe this library could also provide such an option, to "treat ambiguous width as wide". And, I will rewrite this github issue to match that request. |
It's also rendered with a width of 1 in Windows Terminal. Even more, it's rendered with a width of 1 in my webbrowser (chromium). I personally agree that "①" should be East Asian Wide, but unfortunately it is East Asian Ambiguous (and a similar character U+2780 is East Asian Neutral). In my opinion, it may need to be addressed in Unicode, but I'm not sure. Unicode is a bit chaotic. ¯\_(ツ)_/¯ |
Thank you for so much work! You are very helpful. I understand it is because the ① character is an East Asian Ambiguous character, which is treated as different size in different context. I agree that it can have a "treat ambiguous width as wide" option because in most cases it displays the same size as a east asian character in my locale. |
|
I ran into a similar issue when displaying a Unicode filled square character U+25A0 in Windows console (command line prompt) using Since Windows doesn’t provide a |
Thanks for chiming in @fancidev, I did actually write a tool that does just as you describe, "could render the text and compute the width from cursor advancement", that is definitely possible! https://github.com/jquast/ucs-detect Maybe there could be an environment variable that ucs-detect could export that wcwidth could make use of to more accurately determine any widths outside of specification of a given terminal |
Thanks for the info. Good to know there is already such a facility! The demo on the homepage of ucs-detect runs through the characters on screen. I wonder if the width detection can be performed without echo? If that’s possible, a possible solution could be to run through the ambiguous characters on application start-up (as well as upon special console events such as front change) and remember the results. (If the delay is small, the end-user would not feel it.) |
results in:
But it displays 2 character width in monospace font:
The text was updated successfully, but these errors were encountered: