Search Engine is still in development.
- install requirement packages.
- download "punkt" dataset in nltk.
$ python3
>>> import nltk
>>> nltk.download('punkt')
- User Management(login, signup, profile page, ...) powered by Django Boilerplate
- Offline Data Storage Models(Sites, Pages, Links)
- Simple HTML page scraper(powered by
BeautifulSoup
) - Crawler with depth as an endpoint(Simple page crawler powered by
requests
) - Crawling Tasks
- Queues for Crawling Tasks(Powered by celery)
- Template to list active and queues tasks
- Backlink Counter
- Templates(Search page, List sites and it's pages)
- Extract Article content and title(remove extra HTML data such as sidebar, header, ...)
- Indexed Search(Powered by Solr, haystack)
- Sentiment Analysis
- Factoid Extraction and comparison