Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ngrams destroy punctuation #444

Open
giorgio79 opened this issue May 28, 2018 · 1 comment
Open

Ngrams destroy punctuation #444

giorgio79 opened this issue May 28, 2018 · 1 comment

Comments

@giorgio79
Copy link

giorgio79 commented May 28, 2018

Example:
Would be nice to have an option that preserves punctuation:

console.log(nautral_NGrams.bigrams('Some, words here!!'));
[ [ 'Some', 'words' ], [ 'words', 'here' ] ]

I would have liked to see
[ [ 'Some,', 'words' ], [ 'words', 'here!!' ] ]

If chaining commands is implemented eventually at #439 than one could just strip punctuation previously, or pass in to tokenizator first.

@giorgio79
Copy link
Author

Also, tokenizers already split the text in various ways, so I would just keep the splitting logic with the tokenizers...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants