-
-
Notifications
You must be signed in to change notification settings - Fork 484
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC]: Improvements to @stdlib/nlp-expand-contractions #496
Comments
🎉 Welcome! 🎉 And thank you for opening your first issue! We will get back to you shortly. 🏃 💨 |
Doing a review and will submit a PR to Caught some interesting bugs like The other question I wanted to raise is that we should probably handle |
Re: missing contractions. Some of the entries in your list are already present in the contractions file. E.g., |
@Planeshifter Is there a reason for the |
Re: fancy apostrophe. That should be possible to handle in the |
I'm about to submit a PR, one moment @kgryte |
@titanism One recent update: @Planeshifter added initial support for expanding acronyms (see https://github.com/stdlib-js/stdlib/tree/c624a5eb4bca8f4f3d45e01bcc4eeee41652e3ba/lib/node_modules/%40stdlib/nlp/expand-acronyms). This may help to avoid mixing contraction/acronym concerns. |
Description
We're writing as we found your library to be the most tested and fastest for expanding contractions. For context, we're working on https://spamscanner.net and expanding contractions before passing to tokenizers for spam classification.
To clarify, this is with regards to the generated codebase https://github.com/stdlib-js/nlp-expand-contractions from the source at https://github.com/stdlib-js/stdlib/tree/develop/lib/node_modules/%40stdlib/nlp/expand-contractions.
We noticed that your library is missing quite a few contractions in English, and could also benefit from contractions from other languages too (perhaps with an option).
While we can open a PR, we wanted to check to see what your thoughts were on this and how you might want the PR to look like (integration wise; e.g. new options?).
Here is our current compiled list of research and findings:
they're
tothey are
(instead ofthey
andre
) NaturalNode/natural#533Related Issues
No response
Questions
No response
Other
No response
Checklist
RFC:
.The text was updated successfully, but these errors were encountered: