Skip to content

734m/stopwords

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

STOPWORDS

REALLY JUST A LIST OF STOPWORDS WITH SOME HELPERS

Obviously part of something bigger but worth breaking out for reuse.

USAGE


	
require 'stopwords'

#List all stop words
Stopwords::STOP_WORDS

#Test to see if a token is a stop word
Stopwords.is?('and')

=>true

#Ensures a token is both a 'word' and not a stop word
Stopwords.valid?('vector')

=>true

SPECS


$ rake specs

SANITIZE

Not part of the library but you should probably sanitize tokens before using them (if your tokenize doesn’t already)


SANITIZE_REGEXP = /('|\"|‘|’|\/|\\)/
text.downcase.gsub(SANITIZE_REGEXP, '')

ENDAX

Software Services shop (primarily Ruby) in Brooklyn, NY.

About

Just a list of stop words with a handy Module

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published