Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update resolve-leetspeak/dictionary.ts #51

Closed
wants to merge 1 commit into from

Conversation

HatScripts
Copy link
Contributor

@HatScripts HatScripts commented Jan 16, 2024

Type of change:

  • Refactor
  • Performance improvement
  • New feature
  • Bug fix
  • Other (please describe):

Please describe the changes this PR makes and why it should be merged:

Updated the leetspeak dictionary to include:

  • '6''b'
  • '8''b'
  • '9''g'
  • '!''i'
  • '5''s'
  • '7''t'
  • '2''z'

Some of these replacements might be overkill, so please let me know what you think.
Further reading: https://en.wikipedia.org/wiki/Leet#Table_of_leet-speak_substitutes_for_normal_letters

Status:

  • I've added/modified unit tests relevant to my change / not needed
  • This PR contains breaking changes
  • This PR doesn't include changes to the code

Copy link

codecov bot commented Jan 16, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (ad51d19) 100.00% compared to head (8c77c76) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##              main       #51   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           27        27           
  Lines          505       505           
  Branches        92        92           
=========================================
  Hits           505       505           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@HatScripts
Copy link
Contributor Author

HatScripts commented Jan 16, 2024

Also, I noticed that sh|t isn't being censored. Not sure why. Terms like f@g, fu(k, c0ck, etc. are being correctly censored.

Edit: This is happening because | (vertical bar) is being replaced by l (lowercase L) in resolve-confusables:

['l', 'ⓛlŀĺľḷḹļӀℓḽḻłレɭƚɫⱡ|Ɩ⒧ʅǀוןΙІ|ᶩӏ𝓘𝕀𝖨𝗜𝘐𝐥𝑙𝒍𝓁𝔩𝕝𝖑𝗅𝗹𝘭𝚕𝜤𝝞ı𝚤ɩι𝛊𝜄𝜾𝞲'],

@jo3-l
Copy link
Owner

jo3-l commented Jan 18, 2024

Sorry, but I'm not convinced these substitutions are clearly valuable, with the exception of perhaps ! -> i. When we add back whitespace stripping, for instance, 6 itch will be flagged as containing bitch which seems a rather egregious false positive. I'm sure there are many more examples in this vein, so would prefer to err on the side of caution here unless you have a compelling reason otherwise.

@HatScripts
Copy link
Contributor Author

6 itch will be flagged as containing bitch

I could be wrong, but to me, this seems like an argument for including .addWhitelistedTerm('b itch'), not for excluding the leetspeak.

However, I'm having trouble even finding a word ending in b that would appear before the word itch. The only thing I can come up with for now is scab itch.

@jo3-l
Copy link
Owner

jo3-l commented Jan 19, 2024

To be clear, my point is that the phrase 6 itch -- not including the b originally -- would be flagged as containing profanity, because 6 would be remapped to b. That seems a clear mistake to me.

@HatScripts
Copy link
Contributor Author

HatScripts commented Jan 19, 2024

Oh okay, that makes sense. Forgive my confusion.

@HatScripts
Copy link
Contributor Author

Just to clarify, assuming resolve-leetspeak is ran before the check for whitelisted terms, wouldn't .addWhitelistedTerm('b itch') still be enough in the case of 6 itch to ignore the false positive?

@jo3-l
Copy link
Owner

jo3-l commented Jan 19, 2024

assuming resolve-leetspeak is ran before the check for whitelisted terms

This is not the case; whitelisted term matching runs essentially on the original text by default (with only lowercasing applied.) I am not sure this is something we want to change.

@jo3-l
Copy link
Owner

jo3-l commented Feb 1, 2024

Per my previous comments I'm not sure this is worth doing, so closing for now. Happy to discuss if you disagree, as always.

@jo3-l jo3-l closed this Feb 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants