Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CharNGram Based HashVectorizer #265

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from

Conversation

dafajon
Copy link
Contributor

@dafajon dafajon commented Apr 22, 2021

  • Receive ngram_range.
  • Calculate n_grams within a Doc object's lowerized tokens.
  • Send (n_gram, str) pair iterator to FeatureHasher.
  • Implement tests with tokenizer and ngram_range configurations.

@dafajon dafajon added the enhancement New feature or request label Apr 22, 2021
@dafajon dafajon requested a review from husnusensoy April 22, 2021 21:10
@husnusensoy
Copy link
Contributor

Please do add a model which got improved by the use of this Transformer

@husnusensoy husnusensoy added pending Pending Merge Requests question Further information is requested labels Apr 22, 2021
@dafajon
Copy link
Contributor Author

dafajon commented Apr 28, 2021

I improved telco_sentiment with Model test accuracy (accuracy): 0.7040786809372288. Moving on to other prebuilts that use hash.

@husnusensoy
Copy link
Contributor

Please do add that into your pull request. We can cherrypick if needed

@ertugrul-dmr ertugrul-dmr linked an issue Jun 4, 2021 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request pending Pending Merge Requests question Further information is requested
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants