Skip to content

a machine learning approach for processing mathematical language in scientific documents

Notifications You must be signed in to change notification settings

rbzn/project-mlp

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

42 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Mathematical Language Processing

Run

  • compile the maven project
  • adapt the paths to your stratosphere environment in the file cluster-run.sh
  • setup the right values for the parameters of the ranking algorithm also in cluster-run.sh
  • execute the script

Notice

To start the processor, an additional model file is needed. Download the Stanford POS tagger from http://nlp.stanford.edu/software/tagger.shtml. Within this archive is a directory called pos-tagger-models/, containing a variaty of model files for a couple of languages.

If uncertain, the english-left3words-distsim.tagger model is a good starting point.

Tested with http://nlp.stanford.edu/software/stanford-postagger-2012-11-11.zip ... the most recent version http://nlp.stanford.edu/software/stanford-postagger-2014-01-04.zip is currently beeing tested.

Log

To trace was was done on the MLP server install stratosphere via debian package physikerwelt@mlp:~/stanford-postagger-2014-01-04/models$ cp english-left3words-distsim.tagger ~

About

a machine learning approach for processing mathematical language in scientific documents

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published