- Spark 1.5+
- Scala 2.10+
- Stanford Core NLP 3.6.0 Jar
-
TextUtilities/TextTools.scala - contains functions for annotating the text
-
TextUtilities/TextCleaner.scala - contains function for cleaning and preprocessing the text documents
-
DocumentClassification/ModelArchitecture.scala - contains the comple classification architecture
http://analyticsindiamag.com/document-classification-using-apache-spark-scala/