Skip to content

Gain Insights from a Website (Force Graph and WordCloud)

Notifications You must be signed in to change notification settings

rahul-pande/website_insights

Repository files navigation

Website Insights

website_screen_1

The project has four main components:

  • Scrapy crawler to parse websites for data like posts and comments
  • NLTK processing pipeline to extract nouns and adjectives using POS (Part Of Speech) tagger
  • Aggregation in Python using Pandas to generate word frequencies and word co-occurences
  • D3 force simulations to vizualize the co-occurences

The nodes are colored according to part of speech category, their sizes according to the occurrence frequencies. The links between nodes are gradient coloured according to the co-occurrence frequencies.

With this information, you can figure out what topics are being discussed in a website and what are the common context in which the topics are being discussed and also the sentiment surrounding it.

website_screen_3

About

Gain Insights from a Website (Force Graph and WordCloud)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published