Search Engine for Books (Java, Apache Lucene, crawler4j, Apache Spark)
- Crawled about 100,000 web pages using crawler4j and performed link analysis by implementing PageRank on the web graph with Apache Spark’s Graphx.
- Indexed the crawled documents using Apache Lucene and ordered the documents for each query by a combination of PageRank and TF/IDF score.