Skip to content

shivam5992/classification_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Document Classification Pipeline using Apache Spark - Scala

Requirements

  • Spark 1.5+
  • Scala 2.10+
  • Stanford Core NLP 3.6.0 Jar

Files Description

  • TextUtilities/TextTools.scala - contains functions for annotating the text

  • TextUtilities/TextCleaner.scala - contains function for cleaning and preprocessing the text documents

  • DocumentClassification/ModelArchitecture.scala - contains the comple classification architecture

Online Article

http://analyticsindiamag.com/document-classification-using-apache-spark-scala/

Releases

No releases published

Packages

No packages published

Languages