Apache-spark-structured-streaming

Spark sql framework component is used for structured and semi structured data processing.

We use Twitter data since Twitter provides an API for developers that is easy to access and presented an end-to-end architecture on how to stream data from Twitter, clean it, and apply a simple sentiment analysis model to detect the polarity and subjectivity of each tweet.

Code detailed explanation:-

• Initially authentication operations and keys were obtained from twitter api and Python module "Tweepy" • A later stream listener named "Twitter data" was created to generate data for the kafka topic "Global warming". This new method was included in Twitter data object using Affinn module for calculating the sentimental value of tweet • Further the streaming data is converted into the structured data and placed in sql table named "SQldata" which has two columns "text" and "senti_val" • Pyspart.sql functions are used to calculate the average of sentimental values of the senti_val column, function fun is added to categorize the tweet to positive, negative or neutral based on the score.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
Src/Apache spark structured streaming		Src/Apache spark structured streaming
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Apache-spark-structured-streaming

About

Releases

Packages

Languages

Sreebhargavibalijaa/Apache-spark-structured-streaming

Folders and files

Latest commit

History

Repository files navigation

Apache-spark-structured-streaming

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages