Py-Spark implementation of the 6th chapter of the book "Advanced Analytics with Spark: Patterns for Learning from Data at Scale" (Uri Laserson, Sean Owen, Sandy Ryza, Josh Wills), originally implemented in Scala. The goal is to apply LSA (Latent Semantic Analysis) to a corpus of Wikipedia articles. In order to do this, we employ the Wikipedia Data Dumps dataset.
-
Notifications
You must be signed in to change notification settings - Fork 4
dbaikova/Wikipedia_LSA
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Wikipedia Latent Semantic Analysis with PySpark
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published