Skip to content
Thamme Gowda edited this page May 7, 2016 · 7 revisions

Welcome to the Auto-Extractor wiki!

Here you will find information related to Auto Extractor.

Links

The current status

  • Clustering the web pages based on style and structure
  • Scalable on Apache Spark
  • Work in progress - Visualization of clusters

Roadmap

  • Auto extraction of content
  • Integrate to Apache Tika and Apache Nutch

Screenshots

visualization1

Visualization2

Developers

  • Thamme Gowda N.
  • Chirs Mattmann
Clone this wiki locally