Skip to content

dstrodtman/wikipedia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wikipedia Semantic Search

Synopsis: Applied Latent Semantic Analysis to pages scraped from Wikipedia to build functionality for categorizing pages as members of Business Software or Machine Learning.

Methods: LSA, Mongo, tfidf, BeautifulSoup, TruncatedSVD

Data size: 5685 pages of various length with 83729 unique terms

Findings: Final model proved highly accurate at categorizing pages, but performed poorly at finding pages related to limited search terms.

About

Wikipedia Semantic Search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published