Tweets Real-Time NLP Analysis

Project's Presentation

Purpose

This Big Data system analyze the tweets, in Real Time, by applying NLP algorithms. The application will bring us insights about a specific subject,or theme. For example, the app can analyze the sentiment (negative, positive, neutral) of a set of tweets that concern a specific topic or brand or personality. Reliable and scalable, this system operates in a fully distributed environment.

Technical Environment

The app is built in a scalable system, using the tools below:

apache kafka (for the data ETL and streaming data source parts)
apache Spark (for the data processing (NLP))
Spark NLP (John Snow Labs)
HDFS (hadoop) (to store the App jar file, and others files (third jar files, NLP models, etc) required to deploy the app
MongoDB (to store the tweets, and the machine Learning computation results)
zeppelin (data visualization)
ECLIPSE (as IDE)

in terms of computing resources, we can deploy the app on

local mode (using the spark cluster (standalone mode), app depends of local machine)
cluster mode (mesos cluster (using zookeeper quorum)

The app is written in Scala language

Workflow

click here to enlarge the schema

Points to set

Kafka Connect: source (Twitter API) and sink connectors (MongoDB)
Mongo DB collections
Eclipse IDE project (using Maven (POM.xml file))
Apache Spark (Spark SQL, Spark Streaming, Spark ML, Spark NLP)
HDFS (folders system for the app)
Zeppelin (MongoDB interpreter to read data stored in collections)
MESOS resource manager (if cluster mode deployment) (cluster is built on Aws EC2 instances)

Zeppelin Notebook

dashboard

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
README.md		README.md
TweetsNLP.scala		TweetsNLP.scala
Workflow_presentation.pdf		Workflow_presentation.pdf
img001.png		img001.png
pom_xml_demo		pom_xml_demo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tweets Real-Time NLP Analysis

Project's Presentation

Purpose

Technical Environment

Workflow

Points to set

Zeppelin Notebook

About

Uh oh!

Releases

Packages

Uh oh!

Languages

sparktacusdemo/tweets_realtime_nlp_analysis

Folders and files

Latest commit

History

Repository files navigation

Tweets Real-Time NLP Analysis

Project's Presentation

Purpose

Technical Environment

Workflow

Points to set

Zeppelin Notebook

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages