Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added Hindi Aggressive Tokenizer #693

Merged
merged 5 commits into from
Aug 26, 2023

Conversation

MukeshSinghBisht
Copy link
Contributor

@MukeshSinghBisht MukeshSinghBisht commented Jul 24, 2023

Hindi Language aggressive tokenizer.

  • Added Hindi Language aggressive tokenizer which considers language specific symbols to be swallowed.
  • And to test them added several test cases as well.
    I request fellow maintainers and contributors to review and merge my PR or raise queries or suggest changes.
    I would highly appreciate inputs.
    Thanks in Advance.

Copy link
Collaborator

@Hugo-ter-Doest Hugo-ter-Doest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please restrict the copyright notice to the file you authored. Now it says that you authored the aggressive tokenizer as a whole. Also remove the line with "Aggressive Tokenization Open-Source License (Version 1.0)" That is not an actually existing license. We are using MIT license.

Otherwise your license text is compatible with MIT license.

@MukeshSinghBisht
Copy link
Contributor Author

MukeshSinghBisht commented Aug 26, 2023

Please restrict the copyright notice to the file you authored. Now it says that you authored the aggressive tokenizer as a whole. Also remove the line with "Aggressive Tokenization Open-Source License (Version 1.0)" That is not an actually existing license. We are using MIT license.

Otherwise your license text is compatible with MIT license.

Thanks @Hugo-ter-Doest for your inputs,
I made the the changes regarding license text. Please refer commit: 735295c

@coveralls
Copy link

coveralls commented Aug 26, 2023

Pull Request Test Coverage Report for Build 5983185211

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 40 of 40 (100.0%) changed or added relevant lines in 2 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.04%) to 87.331%

Totals Coverage Status
Change from base Build 5965872452: 0.04%
Covered Lines: 9661
Relevant Lines: 10691

💛 - Coveralls

@Hugo-ter-Doest
Copy link
Collaborator

Thanks for your contribution! Very nice to have a Hindi tokenizer as part of the natural library.

@Hugo-ter-Doest Hugo-ter-Doest merged commit bd284d3 into NaturalNode:master Aug 26, 2023
6 checks passed
@MukeshSinghBisht
Copy link
Contributor Author

MukeshSinghBisht commented Aug 26, 2023

Welcome ! I will try adding more contributions ahead.

@Hugo-ter-Doest
Copy link
Collaborator

FYI I added the Hindi tokenizer to the API (index.js and index.d.ts files) and added it to the documentation at https://naturalnode.github.io/natural/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants