-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong character width in full-width symbol #7
Comments
I don't think that's true. At least, on my terminal the first two take up one space and the third two spaces. And the same is true as it displays above in the code block. In fact this works just fine!
Check it on try pandoc. Edit: There is something a bit odd here. In the code block above (as in yours), the pipes on the last line aren't fully lined up. However, they do appear exactly lined up in my text editor. I don't know how to explain that, but what we're aiming for is proper alignment in a text editor. Let's see what happens if we add an extra space in that last line:
That's definitely not lined up. So the slight misalignment in the code block as rendered in the browser seems to be a browser rendering bug of some kind. The browser definitely isn't treating the character as single-wide, but it's not giving it full double width either. Upshot: not a bug, as far as I can see. |
OK. That explains it. In https://www.unicode.org/Public/UNIDATA/EastAsianWidth.txt we see
The "A" means "ambiguous." "Ambiguous characters behave like wide or narrow characters depending on the context (language tag, script identification, associated font, source of data, or explicit markup; all can provide the context). If the context cannot be established reliably, they should be treated as narrow characters by default." So in your locale it is wide. doclayout is the library we use to compute "real widths" for layout. It currently just treats all ambiguous characters as narrow. I'll move this issue to doclayout as a suggestion for further improvement. (It would require some way to make doclayout's functions locale-sensitive, not a small change.) |
@Xitian9 - I believe you mentioned the possibility that this issue would arise! |
Ha! That was fast. I guess the question is how do we accurately and reliably determine the width. If there is surrounding context then it should be straightforward: we can add a context specifier to the |
One approach would be to add a function that allows you to locally set the context, such as
Pandoc could then put the whole document in |
Good idea. Next problem: there are a lot of ambiguous characters in the unicode spec. There are 198 separate entries (which include ranges) in It is error-prone and tedious to define these ourselves. Maybe we should teach |
Makes sense to me. (We should use the approach in emojis, where the parsing code isn't part of the library and thus doesn't add dependencies.) |
@Xitian9 has now provided a context-aware Now it remains to figure out how to modify the rest of the library so that it can be used. It's not as easy as I'd originally thought. For example, we have a One approach would be to change the |
The Reader approach would require a lot of changes. Maybe we could do something simpler, e.g. just adding EDIT: The problem with this approach is that we sometimes use |
To be clearer, the central problem is this: we have data Doc a = Text Int a -- ^ Text with specified width.
| Block Int [a] -- ^ A block with a width and lines.
| VFill Int a -- ^ A vertically expandable block;
-- when concatenated with a block, expands to height and the constructors for Block and VFill take an The problem is, even if we introduced something like -- | Like 'lblock' but aligned to the right.
rblock :: HasChars a => Int -> Doc a -> Doc a
rblock w = block (\s -> replicateChar (w - realLength s) ' ' <> s) w which makes the block left-padded with spaces depending on the real lengths of the rendered lines. So we'd need some kind of large-scale design change in order to introduce a way of changing the context from "wide" to "narrow" for part of the rendered document. Probably the most straightforward approach is to change the type of Block and VFill so they take |
This is my source markdown.
I got following result:
There is a problem on the next line.
and
These results include
|
character.I can modify the source markdown to get the expected result as follows.
However, it is not beautiful.
I think it's a half-width and full-width misjudgment.
◯
and✕
are full width character as well asあ
.Command line
Version
The text was updated successfully, but these errors were encountered: