-
Notifications
You must be signed in to change notification settings - Fork 128
Contributing
Louis Mullie edited this page Apr 4, 2014
·
6 revisions
Here is a list of ideas for contributing to the project.
Top 10 Priority List
In no particular order.
- Link Parser - "The Link Grammar Parser is a syntactic parser of English, based on link grammar." (http://www.link.cs.cmu.edu/link/api/index.html)
YOMU - "A library for extracting text and metadata from files and documents using the Apache Tika content analysis toolkit." (https://github.com/Erol/yomu).- JAWS - "Java API for Wordnet Searching" (http://lyle.smu.edu/~tspell/jaws/index.html) or YAWNI - "Yawni is a pure Java object-oriented interface to the WordNet database of lexical relationships." (http://yawni.sourceforge.net/wiki/index.php).
- Wapiti - "Fast conditional random fields (CRFs) for Ruby" - (https://github.com/inukshuk/wapiti-ruby).
OpenNLP - "The Apache OpenNLP library is a machine learning based toolkit for the processing of natural language text. "(http://incubator.apache.org/opennlp/).- Spidr gem - "Spidr is a versatile Ruby web spidering library that can spider a site, multiple domains, certain links or infinitely." (http://spidr.rubyforge.org/).
- Ariel - "Ariel is a Ruby library that allows you to extract information from semi-structured documents (such as websites)." (http://ariel.rubyforge.org/).
- BoilerPipe - Readability-like HTML boilerplate removal (http://code.google.com/p/boilerpipe/).
- ABNER - "ABNER is a software tool for molecular biology text analysis." (http://pages.cs.wisc.edu/~bsettles/abner/).
- PLDA - "A parallel C++ implementation of fast Gibbs sampling of Latent Dirichlet Allocation" (http://code.google.com/p/plda/)
Other stuff
- Anemone, Sphinx
- RDF.rb - "A pure-Ruby library for working with Resource Description Framework (RDF) data. "(http://rdf.rubyforge.org/)
- Text2rdf library – "A text mining application to extract terms and phrases from the text documents and annotate them with domain specific terminologies." (http://code.google.com/p/text2rdf/)
- Mark Watson has JRuby bindings for the PowerLoom AI reasoning and knowledge representation system.
- SRILM utilities - Many n-gram utilities (http://www.speech.sri.com/projects/srilm/manpages/)
- RSemantic gem - "A document vector search with flexible matrix transforms for Ruby." (https://github.com/josephwilk/rsemantic/wiki/)
- AI4R - "AI4R is a collection of Ruby algorithms implementations, covering several Artificial intelligence fields." (http://ai4r.rubyforge.org/)
- SVMlight - "SVMlight is an implementation of Support Vector Machines (SVMs) in C." (http://svmlight.joachims.org/)
- GHMM - "The General Hidden Markov Model library (GHMM) is a freely available C library implementing efficient data structures and algorithms for basic and extended HMMs with discrete and continous emissions." (http://ghmm.org/)
- PET Parser - "A platform for experimentation with efficient HPSG processing techniques." (http://heartofgold.dfki.de/PET.html)
- Berkeley Parser - "A natural language parser from UC Berkeley." (http://code.google.com/p/berkeleyparser/)
- Alpino Parser - "Alpino is a dependency parser for Dutch." (http://www.let.rug.nl/vannoord/alp/Alpino/)
- Tapas Kanugo has written a page chunking utility (http://www.kanungo.com/software/software.html).
- NLTK - "Open source Python modules for research and development in natural language processing and text analytics." (http://www.nltk.org/code)
- LingPipe - "LingPipe is tool kit for processing text using computational linguistics." http://alias-i.com/lingpipe/
- ACOPOST - "A free and open source collection of part-of-speech taggers." (http://acopost.sourceforge.net/)
- Claws tagger - Part-of-speech tagger for English. (http://ucrel.lancs.ac.uk/claws/)
- FastTag - "- FastTag, a fast Java part of speech tagger." (http://www.markwatson.com/opensource/)
- Citar - "Citar is a simple part-of-speech tagger, based on a trigram Hidden Markov Model (HMM)." (https://github.com/danieldk/citar)