-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add basic Unicode support #169
Conversation
Thanks for making this, I'll review it properly when I have time (I'm a bit busy at the moment). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Finally I got around to making the review. Sorry about the wait.
I have a lot of comments about style here. I have quite strong opinions about code style, unfortunately. Usually when someone makes a pull request I will make the code style changes myself, but I am too busy to do that at the moment.
I think it's best if before the UI_UNICODE
changes are merged in, you could address some of the things you have mentioned as things to be improved, like cursor positioning with the mouse.
Thank you.
luigi2.h
Outdated
#error "Unicode support requires Freetype" | ||
#endif | ||
|
||
#define UNICODE_MAX_CODEPOINT 0x10FFFF |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All definitions internal to the UI library should be prefixed with _UI
, since it is intended to be used as a single-file-header library.
luigi2.h
Outdated
#define UNICODE_MAX_CODEPOINT 0x10FFFF | ||
|
||
bool _Utf8ApplyExtendedByte(int byte, int *codePoint, int shift) | ||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style: The bracket should be placed on the same line as the function arguments.
luigi2.h
Outdated
return -1; | ||
} | ||
|
||
int codePoint = int(first & (0x3F >> numExtraBytes)) << (6 * numExtraBytes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style: Use (int) x
to cast.
luigi2.h
Outdated
|
||
int codePoint = int(first & (0x3F >> numExtraBytes)) << (6 * numExtraBytes); | ||
for (ptrdiff_t idx = 1; idx < numExtraBytes + 1; idx++) { | ||
if (!_Utf8ApplyExtendedByte(cString[idx], &codePoint, numExtraBytes - idx)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_Utf8ApplyExtendedByte
really doesn't need to be a separate function here, and this doesn't need to be done in a loop. Just write out the bit-operations.
luigi2.h
Outdated
int Utf8GetCodePoint2(const char *cString, ptrdiff_t bytesLength) | ||
{ | ||
ptrdiff_t bytesConsumed; | ||
return Utf8GetCodePoint(cString, bytesLength, &bytesConsumed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of adding a second function, change the semantics of Utf8GetCodePoint
so that if NULL
is passed for the bytesConsumed
pointer, then it does not write to it.
luigi2.h
Outdated
} | ||
|
||
ptrdiff_t length = 0; | ||
ptrdiff_t byteIdx = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style: rename to byteIndex
.
luigi2.h
Outdated
ti++; | ||
if (code->content[byte + code->lines[line].offset] == '\t') while (ti % code->tabSize) ti++; | ||
if (column < ti) break; | ||
|
||
#ifdef UI_UNICODE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All these #ifdef
s get quite messy. It would make more sense to have a few macros that handle the basic string operations and the definition of those macros depend on whether UI_UNICODE
is defined.
luigi2.h
Outdated
|
||
last <<= 8; | ||
last |= c; | ||
last |= (char)c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style: add a whitespace between the (cast)
type and the expression.
luigi2.h
Outdated
@@ -3041,6 +3200,48 @@ int _UICodeMessage(UIElement *element, UIMessage message, int di, void *dp) { | |||
return 0; | |||
} | |||
|
|||
#if UI_UNICODE | |||
void UICodeMoveCaret(UICode *code, bool backward, bool word) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, if you have separate defines for the string operations I think you won't need to duplicate this entire function's logic?
Thanks for reviewing the PR. Don't worry about the coding style change requests, it's your code and your rules. I'll see what I can do in the upcoming days as I'm a bit busy myself ATM. |
My apologies for taking so long to get back to this. I addressed the coding style issues and factored out some parts of functions to macros to avoid the convoluted ifdefs. I still need to investigate the cursor positioning issue. It appears that I can repro it even with unicode handling switched off. Hopefully I should have more time to work on this now... |
placement after a TAB character
I think I figured out the cursor placement issue. Let me know if you'd like me to revise the code further or do some additional tests. |
Thanks, I'll try to take a look at the weekend. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please see the attached comments. The algorithms are looking good, but there is a lot of duplicated code at the moment which needs to be simplified before these changes can be merged. Thank you!
luigi2.h
Outdated
ti++; | ||
if (code->content[i + code->lines[line].offset] == '\t') while (ti % code->tabSize) ti++; | ||
_UI_ADVANCE_COLUMN(i, code, byte); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to take into accounts tab characters, as the old code did.
luigi2.h
Outdated
|
||
last <<= 8; | ||
last |= c; | ||
last |= (char) c; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing out the & 0xFF
would make this a bit clearer.
while (bytes) { | ||
#ifdef UI_UNICODE | ||
ptrdiff_t bytesConsumed; | ||
int c = Utf8GetCodePoint(string, bytes, &bytesConsumed); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of having #ifdef UI_UNICODE
everywhere, please merge this logic into a single macro which can be used here, in place of _UI_ADVANCE_BYTE
and in place of _UI_ADVANCE_COLUMN
.
luigi2.h
Outdated
#ifdef UI_UNICODE | ||
|
||
#define _UI_TEXTBOX_MOVE_CARET_BACKWARD(textbox) do { \ | ||
char *prev = Utf8GetPreviousChar(textbox->string, textbox->string + textbox->carets[0]); \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This logic should not be duplicated with the _UI_CODE_MOVE_
functions. There should be a common macro for doing all of this.
from column to byte for UITextbox
Thanks for the comments. I've generalized cursor positioning and byte<->column translation functions. Hopefully I've also fixed an issue where the code would assert on an attempt to get a code point from an empty string. As far as I could test things seem to behave correctly. |
I'm sorry for the ping... is there anything else you'd like me to revise? |
Sorry, I have it on my list of things to-do to review these changes. But I have been busy/tired lately, so I haven't done it yet. |
Thanks for submitting these changes! |
This patch adds basic Unicode support to the UI. It relies on Freetype font renderer and is hidden by default behind UI_UNICODE flag. Encoding support is currently limited to UTF-8.
What works:
?
without any other adverse effects.What could be improved:
IsCharAlpha()
function would have to be much more sophisticated.What does not work:
Notes:
carets
inUIStringSelection
to by how many glyphs the caret is offset from the beginning rather than bytes.UITextbox
contains tabs.Comments welcome :)