Handling text that implements navigation of character sequences properly #4

ultrasound1372 · 2021-07-20T01:29:08Z

Some edit fields, like notepad, do not handle combining characters as unicode expects them to; one may navigate through each character in the sequence. This is especially obvious for emoji. This add-on assumes that all edit fields are like that. So when encountering a text edit or display that properly handles navigation of character sequences that are canonically one character, one receives N/A when attempting to get info.
Some examples of such characters:

o͡l̪ (combining markings that cannot be precomposed)
ö (a character with a combining modifier that has a precomposed form [ö])
👨‍🎤 (an emoji ZWJ sequence)

Please try and do something about this if you can. I'd think of comma delimiting such instances in the table, so the second example could be "LATIN SMALL LETTER O, COMBINING DIAERESIS". If this is hard to do or if you would prefer, you could also create multiple tables; one for each character.
As for the emoji I suppose presenting the query to the CLDR would return a valid name, so if you receive a sequence I guess you should try posing the entire sequence first when retrieving the CLDR name.
I have encountered such text within the windows 10 app Unigram, both within the edit field and when using the review cursor on messages, so it is also a behavior of NVDA. As I said, the unicode annex actually defines that to be the proper way to navigate such sequences.

CyrilleB79 · 2021-07-22T20:39:53Z

Having N/A's is surely an issue. And I will try to fix it at first.

Afterwards, I will have to figure the issue of NVDA itself.
Have you opened an issue against NVDA regarding this topic? If yes, could you give the reference? If not, could you do it? Thanks.

CyrilleB79 · 2023-01-27T14:37:54Z

Hi @ultrasound1372

I guess the majority of the combined character issues have been addressed in c631f9f.

The only remaining problem is the emoji ZWJ sequence. However, it seems to be 3 characters (also visually). I'd prefer the issue fixed in NVDA if possible.

Would you be able to test the add-on version in the master branch and confirm? I have not yet done a release.

ultrasound1372 · 2023-01-27T19:56:38Z

I'm trying to build it with scons, but you require the use of scripts.generateDataFR in your build script, and I can't seem to get this to run. First I get module not found, and upon fudging an init file in scripts to create a package so the import works, I get module not found on globalPlugin. How does your build workflow work and can you make it easier to use?

CyrilleB79 · 2023-01-27T21:01:30Z

Thanks for your trial.

It was working (and is still working) on my setup just launching the scons command. But there was something strange regarding the import and I do not fully understand why it was working this way. Anyway, I have fixed the package issue in master branch adding the __init__.py and changing the imports in the sconstruct file.

Could you try again? Is there still an error with globalPlugin? If yes, could you copy here the traceback so that I can understand the cause and fix it? Thanks.

ultrasound1372 · 2023-01-29T17:27:05Z

Yes, the handling of multiple combining characters (such as a̋̇) also functions correctly. The only thing I'd say should be changed is in the second table when you list info from the symbols dictionary your sub-columns are not themselves marked as headers. So when navigating around I hear character 0 read multiple times but not replacement, preserve, and level. Just value. I think you should get rid of the value column altogether and just make it look something like this. ~~Although keep your row headers as row headers, GFM just doesn't let me make row headers.~~

Attribute	Replacement	Level	Preserve
Symbol description	space	character	never
Symbol description in user file (en_US)	[Not defined]	[Not defined]	[Not defined]
Symbol description in English file	space	character	[Not defined]
Symbol description in English CLDR file	[Not defined]	[Not defined]	[Not defined]

CyrilleB79 · 2023-01-29T20:39:58Z

Yes, the handling of multiple combining characters (such as a̋̇) also functions correctly. The only thing I'd say should be changed is in the second table when you list info from the symbols dictionary your sub-columns are not themselves marked as headers. So when navigating around I hear character 0 read multiple times but not replacement, preserve, and level. Just value. I think you should get rid of the value column altogether and just make it look something like this. ~~Although keep your row headers as row headers, GFM just doesn't let me make row headers.~~
Attribute Replacement Level Preserve
Symbol description space character never
Symbol description in user file (en_US) [Not defined] [Not defined] [Not defined]
Symbol description in English file space character [Not defined]
Symbol description in English CLDR file [Not defined] [Not defined] [Not defined]

Good catch for the headers reading.

Not sure to understand your suggestion though. For a multi-compound character such as a̋̇, do you think that the paragraph "Symbol description in NVDA" should contain 3 table (one per character) instead of a big table? This makes sense, but again, I do not know if it was your suggestion.

As an additional question: did you succeed in running scons command?

ultrasound1372 · 2023-01-30T17:30:17Z

No my suggestion was just to fix the reading of the headers, and possibly get rid of the Value column span and simply duplicate the replacement/level/preserve columns if necessary. I don't think they should be split into three separate tables, I like the way this is done as one big table. Might make it so that if nothing is defined in that source, such as the user symbols dictionary, you just don't include that row, but that would be all I'd think. I'm unsure if having a row for the description of the entire combined character would make sense, it would probably only come into play for emojis.
And yes I was able to run scons just fine.

CyrilleB79 · 2023-01-31T16:42:53Z

Hi @ultrasound1372 ,

You can check last work in 46f5369 (latest master). I have simplified the symbol description table. And I have got rid of the double header row to get less verbosity.

You also write:

Might make it so that if nothing is defined in that source, such as the user symbols dictionary, you just don't include that row, but that would be all I'd think.

For the emoji ZWJ sequence such as 👨‍🎤:

It is seen by NVDA as one character in some situation, e.g. GitHub's new comment field in Chrome browser, and as many characters in other situations (e.g. Notepad).
Have you opened an issue in NVDA for this character seen as many? Could you link it here? If not yet, could you open it?

I'd prefer to keep these rows present so that the user knows where NVDA is looking at to get the symbol/character description. Also, it allows me to check that the add-on looks at the correct location and that I have not forgotten any location.
Anyway, this request is going a bit off-topic with respect of this issue. You may comment in #5 if suitable or open a new issue if you want to discuss this topic further.

Also you write:

I'm unsure if having a row for the description of the entire combined character would make sense, it would probably only come into play for emojis.

This should be discussed separately in a new issue when the multi-char problem of NVDA is solved.

ultrasound1372 · 2023-02-02T19:28:22Z

The NVDA multi-character thing is likely an implementation detail of the different accessibility frameworks. But yes, that commit looks good, and in regards to this issue I think we're done.

ultrasound1372 closed this as completed Feb 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handling text that implements navigation of character sequences properly #4

Handling text that implements navigation of character sequences properly #4

ultrasound1372 commented Jul 20, 2021

CyrilleB79 commented Jul 22, 2021

CyrilleB79 commented Jan 27, 2023

ultrasound1372 commented Jan 27, 2023

CyrilleB79 commented Jan 27, 2023

ultrasound1372 commented Jan 29, 2023

CyrilleB79 commented Jan 29, 2023

ultrasound1372 commented Jan 30, 2023 •

edited

Loading

CyrilleB79 commented Jan 31, 2023

ultrasound1372 commented Feb 2, 2023

Handling text that implements navigation of character sequences properly #4

Handling text that implements navigation of character sequences properly #4

Comments

ultrasound1372 commented Jul 20, 2021

CyrilleB79 commented Jul 22, 2021

CyrilleB79 commented Jan 27, 2023

ultrasound1372 commented Jan 27, 2023

CyrilleB79 commented Jan 27, 2023

ultrasound1372 commented Jan 29, 2023

CyrilleB79 commented Jan 29, 2023

ultrasound1372 commented Jan 30, 2023 • edited Loading

CyrilleB79 commented Jan 31, 2023

ultrasound1372 commented Feb 2, 2023

ultrasound1372 commented Jan 30, 2023 •

edited

Loading