- Search for a topic on Twitter (stream of tweets are real-time processed and the values are computed since search has started).
- Change word size mapping to word count or mean followers count of word authors.
- Interact with word cloud by panning and zooming.
- Move up to 50 seconds back in time by moving the timeline slider.
- Find current trends about a topic on Twitter.
- Discover credibility of words' authors by mean followers count of a word
- See total tweets count in a topic and compare change in word usage during a time.
Note: Kafka and Zookeeper are running in Docker containers.
pip install -r requirements.txt
- docker, docker-compose
- Spark (pyspark) -
- Redis
- Create new developer app and create
file with your configuration followingconfig/config-example.json
format.- consumer_key, consumer_secret, access_token, access_token_secret - from Twitter apps
- docker_kafka_ip - ip address of docker container with Kafka, same as
(see logs in docker container for IP address, e.g.Established session 0x15bec44028b0001 with negotiated timeout 30000 for client /
- Start docker compose containers.
sudo docker-compose up
- Set Spark directory path.
export SPARK_HOME = <path_to_spark_home_directory>
- Run web application.
python webapp/app.py
Other useful commands:
# Clear container
sudo docker-compose rm
# Run docker containers
sudo docker-compose up
# In case you need to stop containers
sudo docker-compose stop
# Test connection with Kafkacat - producing to topic 'tweets' (writing to console)
kafkacat -b -t tweets -P
# Test connection with Kafkacat - consumer from topic 'tweets'
kafkacat -b -t tweets -C
# Run spark job processing (start automatically by web app)
--jars jar/spark-streaming-kafka-0-8-assembly_2.11-2.0.2.jar
--master local[*]
# Clear redis
redis-cli flushall
# Delete Spark checkpointing
rm -rf checkpoint-tweet
# Run web app for visualization using D3.js
python webapp/app.py