Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Unicode in chat & frag #14

Merged
merged 29 commits into from
Jul 14, 2024

Conversation

HenryQuan
Copy link
Contributor

@HenryQuan HenryQuan commented Oct 9, 2023

This PR addresses #6 to support Chinese, Japanese and Korean languages in Chat. Now, the font is determined based on the message language.

Two new dependencies are added, langdetect and hanzidentifier. Unfortunately, langdetect can detect Chinese as Korean, so the second library is required.

However, there are several issues with this implementation.

  • Huge font sizes (11MB for Chinese, 5MB for Korean and 6MB for Japanese)
  • More memory usage (23MB in total)
  • Language detection can be inaccurate

@HenryQuan HenryQuan mentioned this pull request Oct 9, 2023
@HenryQuan
Copy link
Contributor Author

I have moved my function to resman instead of putting it under chat. There is also a minor dependency issue when I upgraded to Python 3.11. lxml==4.9.1 doesn't work somehow, I had to use lxml==4.9.4 locally. I didn't commit this change because I don't know if it breaks anything.

There is also one additional fix for this PR. I have supported rendering Chinese player names under frag. This is CN realm specific, so other servers aren't affected by this change.

For Chinese names to be rendered, I have to update battle_controller where I add the encoding under line 742 (12_11_1). I will also open an issue under replays_unpack.

name=player["name"].encode('ISO8859-1').decode('UTF-8'),

@HenryQuan HenryQuan changed the title Support Unicode in chat Support Unicode in chat & frag Dec 29, 2023
@wihn2021
Copy link

wihn2021 commented Jan 4, 2024

maybe no need to detect language, just let users add an argument, for example, "python -m render xxxx.wowsreplay --lang chinese", default argument is English. Then renderer loads different font files and ship infomation files

@HenryQuan
Copy link
Contributor Author

HenryQuan commented Jan 4, 2024

@wihn2021 That is a good idea, but the chat may also contain Japanese or Korean at any moment. The Chinese font doesn't render all Japanese character or any Korean alphabet. Maybe, the flag can be something like --unicode to enable this feature or not. The renderer can determine the realm of the replay which is used to render frags correctly in this PR.

@MeowthTim
Copy link

MeowthTim commented Mar 11, 2024

For Chinese names to be rendered, I have to update battle_controller where I add the encoding under line 742 (12_11_1). I will also open an issue under replays_unpack.

name=player["name"].encode('ISO8859-1').decode('UTF-8'),

That is a very useful step for Chinese support. To use especilly for CN server without JP or KR, it just need a little change. The other thing I guess is translating the ship.json to Chinese which I would try to solve with the zh_lang.json in game. Thanks for your job.

@HenryQuan
Copy link
Contributor Author

HenryQuan commented Mar 11, 2024

@MeowthTim I have the game language file in all supported languages. It is definitely possible to localise even further to support the Chinese server better. It should be available probably inside my custom lang.json.

@JustOneSummer
Copy link

when will this be merged into the main line?

@padtrack
Copy link
Member

I put off addressing this because I was working on a rewrite that I burned out on. I'll merge this the next chance I have to work on personal projects.

@padtrack
Copy link
Member

I am reviewing this on the train. Can you upload some test replays? I had some mixed language replays when I worked on this. The game does not load more than one font, but I think it would be nice to support all CJK characters and more together.

@padtrack
Copy link
Member

@wihn2021 That is a good idea, but the chat may also contain Japanese or Korean at any moment. The Chinese font doesn't render all Japanese character or any Korean alphabet. Maybe, the flag can be something like --unicode to enable this feature or not. The renderer can determine the realm of the replay which is used to render frags correctly in this PR.

Do you have example of JP characters the Chinese font doesn't support? I checked my notes from May 2023 and I saw Chinese font supported all random JP phrases I generated.

@HenryQuan
Copy link
Contributor Author

HenryQuan commented Jul 12, 2024

@padtrack While working on this, I found out that WoWs provided different fonts for Korean, Japanese and Chinese.

I also have a feeling that the Chinese Font supports most Japanese characters, but I am unsure if this is a valid assumption. However, the Japanese Font doesn't support all Chinese characters. If we want to remove a Font, it could be the Japanese one. On the other hand, the Korean Font should be kept.

The upstream has also fixed some encoding issues regarding the clan name in Monstrofil/replays_unpack#30. Maybe, @JustOneSummer dalao can provide some replays. I don't have any at hand now. We can have a training room replay at some time including all three languages.

I had an old Chinese server replay, 20231214_203306_PFSC210-Marseille_46_Estuary.zip, if this helps.

@padtrack
Copy link
Member

It's alright, I think I can probably find the old replays I had when I'm home. If you had any on hand it would have been convenient.

Screenshots from when I discussed with a JP speaker:
image

image

@padtrack padtrack self-assigned this Jul 14, 2024
@padtrack
Copy link
Member

@HenryQuan Can you enable Allow edits from maintainers? I also sent a message to what I think is your Discord.

@padtrack padtrack merged commit 23a8d62 into WoWs-Builder-Team:master Jul 14, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants