Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(js/string_util): u2b(): convert non-Big5 chars to A1BC (□) instead of FFFD (non-Big5) #126

Open
wants to merge 1 commit into
base: dev
Choose a base branch
from

Conversation

IepIweidieng
Copy link
Contributor

@IepIweidieng IepIweidieng commented Dec 8, 2023

Related issues: ptt/pttbbs#95

FFFD is not a valid Big5-UAO character but a Unicode character. Also, it means IAC DO <lacking option id> in Telnet protocol if the FF is not replaced by IAC IAC (escaped 0xFF).

To fix the issue, non-Big5-UAO chars are now converted into A1BC (Big5 '□'). Also, UTF-16 high surrogates are now ignored to make the char count consistent.

A1BC (Big5 '□') has been chosen for the following reasons:

  • Visual feedback when the input has been received and processed.
  • Non-ASCII code for preventing unwanted operations.

This makes all the following cases convert to a single A1BC (□).

String UTF-16 prev u2b() Telnet meaning
'孒' (U+5B52) 5B52 FF FD IAC DO
'𡤼' (U+2193C) D846 DD3C FF FD FF FD IAC DO Extended-Options-List 'FD'

※ The choice of using A1BC (□) is subject to changes

Sample Text

《枇踼踼鉻》妤像𣎴太龍夠芷常虛埋韭尢五嗎宇元旳逈滤?
找左《萝之尢地》朾䒩—萹文䈇朾刭—羋旳畤侯㑹—宜斲鰁,㭙别昰𣎴少心輪人孒韭尢五嗎宇元旳畤侯。
—閧紿𣎴知𨑬昰 BBS 桯弍旳間㼵遝昰 PttChrome 旳間㼵。
𣎴逈硑宄之𢓭,發垷左 PttChrome 輪人韭尢五嗎宇元旳畤侯,送㞢旳貣科㑹怶解譊戍挌弍鍇洖旳 Telnet 命今。
垷左己鋞知𨑬昰㖿—段桯弍旳間㼵孒。令夭虑絯㞊龍夠修妤。

previous u2b()

《枇踼踼鉻》妤像太龍夠芷常虛埋韭尢五嗎宇元旳逈滤?
找左《萝之尢地》朾X萹文??滿X羋旳畤侯X宜斲Ag昰少心輪人握q五嗎宇元旳畤侯。
—閧紿知昰 BBS 桯弍旳間??? PttChrome 旳間C
逈硑宄之,發? PttChrome 輪人韭尢五嗎宇元旳畤侯,送筬漎鼽栺囍形??軋眣d旳 Telnet 命今。
炊v鋞知昰X段桯弍旳間。令夭虑絯s夠修妤。

new u2b()

《枇踼踼鉻》妤像□太龍夠芷常虛埋韭尢五嗎宇元旳逈滤?
找左《萝之尢地》朾□—萹文□朾刭—羋旳畤侯□—宜斲□,□别昰□少心輪人□韭尢五嗎宇元旳畤侯。
—閧紿□知□昰 BBS 桯弍旳間□遝昰 PttChrome 旳間□。
□逈硑宄之□,發□左 PttChrome 輪人韭尢五嗎宇元旳畤侯,送□旳貣科□□解譊戍挌弍鍇洖旳 Telnet 命今。
□左己鋞知□昰□—段桯弍旳間□□。令夭虑絯□龍夠修妤。

@IepIweidieng IepIweidieng changed the title fix(js/string_util): convert non-Big5-UAO chars to A1BC (□) instead of FFFD (non-Big5-UAO) fix(js/string_util): u2b(): convert non-Big5-UAO chars to A1BC (□) instead of FFFD (non-Big5-UAO) Dec 8, 2023
@IepIweidieng IepIweidieng changed the title fix(js/string_util): u2b(): convert non-Big5-UAO chars to A1BC (□) instead of FFFD (non-Big5-UAO) fix(js/string_util): u2b(): convert non-Big5 chars to A1BC (□) instead of FFFD (non-Big5) Dec 8, 2023
…d of FFFD (non-Big5)

FFFD is not a valid Big5-UAO character but a Unicode character.
Also, it means `IAC DO <lacking option id>` in Telnet protocol
if the FF is not replaced by `IAC IAC` (escaped 0xFF).

To fix the issue, non-Big5-UAO chars are now converted into A1BC (Big5 '□').
Also, UTF-16 high surrogates are now ignored to make the char count consistent.

A1BC (Big5 '□') has been chosen for the following reasons:

* Visual feedback when the input has been received and processed.
* Non-ASCII code for preventing unwanted operations.

This makes all the following cases convert to a single A1BC (□).

 String        | UTF-16    | prev u2b()  | Telnet meaning
 ------------- | --------- | ----------- | --------------
'孒' (U+5B52)  | 5B52      | FF FD       | IAC DO
'𡤼' (U+2193C) | D846 DD3C | FF FD FF FD | IAC DO Extended-Options-List 'FD'
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant