Website Insights

The project has four main components:

Scrapy crawler to parse websites for data like posts and comments
NLTK processing pipeline to extract nouns and adjectives using POS (Part Of Speech) tagger
Aggregation in Python using Pandas to generate word frequencies and word co-occurences
D3 force simulations to vizualize the co-occurences

The nodes are colored according to part of speech category, their sizes according to the occurrence frequencies. The links between nodes are gradient coloured according to the co-occurrence frequencies.

With this information, you can figure out what topics are being discussed in a website and what are the common context in which the topics are being discussed and also the sentiment surrounding it.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
_data		_data
_screenshots		_screenshots
d3		d3
jupyter_notebooks		jupyter_notebooks
site_crawler		site_crawler
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
scrapy.cfg		scrapy.cfg
simple-cors-http-server.py		simple-cors-http-server.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Website Insights

About

Releases

Packages

Contributors 2

Languages

rahul-pande/website_insights

Folders and files

Latest commit

History

Repository files navigation

Website Insights

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages