Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add basic Unicode support #169

Merged
merged 10 commits into from
Aug 21, 2024
Merged

Add basic Unicode support #169

merged 10 commits into from
Aug 21, 2024

Conversation

MadCatX
Copy link
Contributor

@MadCatX MadCatX commented Feb 25, 2024

This patch adds basic Unicode support to the UI. It relies on Freetype font renderer and is hidden by default behind UI_UNICODE flag. Encoding support is currently limited to UTF-8.

What works:

What could be improved:

  • Word-skipping in text with Unicode characters is a bit flaky. To fix this, I suppose that the IsCharAlpha() function would have to be much more sophisticated.
  • Cursor position selection with mouse and invalid UTF-8 input. Sometimes the cursor is positioned a few characters behind the expected position if the expected position lies inside an invalid UTF-8 sequence. Navigation with keyboard seems to work fine.

What does not work:

  • Textboxes currently do not accept Unicode input from keyboard keystrokes. A workaround is to copy-paste it in.

Notes:

  • This patch changes the meaning of carets in UIStringSelection to by how many glyphs the caret is offset from the beginning rather than bytes.
  • This patch also fixes a minor rendering bug when a text selection in UITextbox contains tabs.

Comments welcome :)

@nakst
Copy link
Owner

nakst commented Feb 27, 2024

Thanks for making this, I'll review it properly when I have time (I'm a bit busy at the moment).

Copy link
Owner

@nakst nakst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally I got around to making the review. Sorry about the wait.
I have a lot of comments about style here. I have quite strong opinions about code style, unfortunately. Usually when someone makes a pull request I will make the code style changes myself, but I am too busy to do that at the moment.
I think it's best if before the UI_UNICODE changes are merged in, you could address some of the things you have mentioned as things to be improved, like cursor positioning with the mouse.
Thank you.

luigi2.h Outdated
#error "Unicode support requires Freetype"
#endif

#define UNICODE_MAX_CODEPOINT 0x10FFFF
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All definitions internal to the UI library should be prefixed with _UI, since it is intended to be used as a single-file-header library.

luigi2.h Outdated
#define UNICODE_MAX_CODEPOINT 0x10FFFF

bool _Utf8ApplyExtendedByte(int byte, int *codePoint, int shift)
{
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style: The bracket should be placed on the same line as the function arguments.

luigi2.h Outdated
return -1;
}

int codePoint = int(first & (0x3F >> numExtraBytes)) << (6 * numExtraBytes);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style: Use (int) x to cast.

luigi2.h Outdated

int codePoint = int(first & (0x3F >> numExtraBytes)) << (6 * numExtraBytes);
for (ptrdiff_t idx = 1; idx < numExtraBytes + 1; idx++) {
if (!_Utf8ApplyExtendedByte(cString[idx], &codePoint, numExtraBytes - idx)) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_Utf8ApplyExtendedByte really doesn't need to be a separate function here, and this doesn't need to be done in a loop. Just write out the bit-operations.

luigi2.h Outdated
int Utf8GetCodePoint2(const char *cString, ptrdiff_t bytesLength)
{
ptrdiff_t bytesConsumed;
return Utf8GetCodePoint(cString, bytesLength, &bytesConsumed);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of adding a second function, change the semantics of Utf8GetCodePoint so that if NULL is passed for the bytesConsumed pointer, then it does not write to it.

luigi2.h Outdated
}

ptrdiff_t length = 0;
ptrdiff_t byteIdx = 0;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style: rename to byteIndex.

luigi2.h Outdated
ti++;
if (code->content[byte + code->lines[line].offset] == '\t') while (ti % code->tabSize) ti++;
if (column < ti) break;

#ifdef UI_UNICODE
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All these #ifdefs get quite messy. It would make more sense to have a few macros that handle the basic string operations and the definition of those macros depend on whether UI_UNICODE is defined.

luigi2.h Outdated

last <<= 8;
last |= c;
last |= (char)c;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style: add a whitespace between the (cast) type and the expression.

luigi2.h Outdated
@@ -3041,6 +3200,48 @@ int _UICodeMessage(UIElement *element, UIMessage message, int di, void *dp) {
return 0;
}

#if UI_UNICODE
void UICodeMoveCaret(UICode *code, bool backward, bool word) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if you have separate defines for the string operations I think you won't need to duplicate this entire function's logic?

@MadCatX
Copy link
Contributor Author

MadCatX commented Mar 14, 2024

Thanks for reviewing the PR. Don't worry about the coding style change requests, it's your code and your rules. I'll see what I can do in the upcoming days as I'm a bit busy myself ATM.

@MadCatX
Copy link
Contributor Author

MadCatX commented Apr 18, 2024

My apologies for taking so long to get back to this. I addressed the coding style issues and factored out some parts of functions to macros to avoid the convoluted ifdefs.

I still need to investigate the cursor positioning issue. It appears that I can repro it even with unicode handling switched off. Hopefully I should have more time to work on this now...

@MadCatX
Copy link
Contributor Author

MadCatX commented Apr 23, 2024

I think I figured out the cursor placement issue. Let me know if you'd like me to revise the code further or do some additional tests.

@nakst
Copy link
Owner

nakst commented May 2, 2024

Thanks, I'll try to take a look at the weekend.

Copy link
Owner

@nakst nakst left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see the attached comments. The algorithms are looking good, but there is a lot of duplicated code at the moment which needs to be simplified before these changes can be merged. Thank you!

luigi2.h Outdated
ti++;
if (code->content[i + code->lines[line].offset] == '\t') while (ti % code->tabSize) ti++;
_UI_ADVANCE_COLUMN(i, code, byte);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to take into accounts tab characters, as the old code did.

luigi2.h Outdated

last <<= 8;
last |= c;
last |= (char) c;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Writing out the & 0xFF would make this a bit clearer.

while (bytes) {
#ifdef UI_UNICODE
ptrdiff_t bytesConsumed;
int c = Utf8GetCodePoint(string, bytes, &bytesConsumed);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of having #ifdef UI_UNICODE everywhere, please merge this logic into a single macro which can be used here, in place of _UI_ADVANCE_BYTE and in place of _UI_ADVANCE_COLUMN.

luigi2.h Outdated
#ifdef UI_UNICODE

#define _UI_TEXTBOX_MOVE_CARET_BACKWARD(textbox) do { \
char *prev = Utf8GetPreviousChar(textbox->string, textbox->string + textbox->carets[0]); \
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic should not be duplicated with the _UI_CODE_MOVE_ functions. There should be a common macro for doing all of this.

@MadCatX
Copy link
Contributor Author

MadCatX commented May 8, 2024

Thanks for the comments. I've generalized cursor positioning and byte<->column translation functions. Hopefully I've also fixed an issue where the code would assert on an attempt to get a code point from an empty string. As far as I could test things seem to behave correctly.

@MadCatX
Copy link
Contributor Author

MadCatX commented Jun 15, 2024

I'm sorry for the ping... is there anything else you'd like me to revise?

@nakst
Copy link
Owner

nakst commented Jun 16, 2024

Sorry, I have it on my list of things to-do to review these changes. But I have been busy/tired lately, so I haven't done it yet.

@nakst nakst merged commit 583c3b7 into nakst:master Aug 21, 2024
@nakst
Copy link
Owner

nakst commented Aug 21, 2024

Thanks for submitting these changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants