Skip to content

micahzev/web_page_classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Web Page Classification System

This repo is a collection of code that make up a web page classification system.

The problem is a multilcass classification problem.

The classification algorithm is hierarchical classification with Random Forest and Feature Selection for Insight Analysis

The system works as follows:

  • site data is scraped from labeled urls (not included in repo)
  • features are built from parsed site data
  • features are fed into a different learning algorithms for classification

Built using Python, SciKit Learn, PyMongo, Keras and Matplotlib.

License

MIT

About

End to end web page classification algorithm

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published