-
Notifications
You must be signed in to change notification settings - Fork 16
Python application that allows one to open a connection to a live stream from Twitter to Apache Kafka for use in Demo / POC situations.
License
ActianCorp/twitter-streaming
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
________ _______________ ____________ ___ __/__ ____(_)_ /__ /______________ ___/_ /__________________ _______ ___ __ / __ | /| / /_ /_ __/ __/ _ \_ ___/____ \_ __/_ ___/ _ \ __ `/_ __ `__ \ _ / __ |/ |/ /_ / / /_ / /_ / __/ / ____/ // /_ _ / / __/ /_/ /_ / / / / / /_/ ____/|__/ /_/ \__/ \__/ \___//_/ /____/ \__/ /_/ \___/\__,_/ /_/ /_/ /_/ ########################################### ### twitterStream Data Ingest version 1.0 ########################################### To begin generating data: 1. First open twitter_kafka_direct.py and add in the needed credentials for your twitter dev account. * http://dev.twitter.com 2. Ensure you have all requirements installed and that python can access the modules (see requirements.txt) . - if you need for example tweepy and have pip : >> pip install tweepy - if you don't have pip download get_pip.py (latest from the google webs :) and run: >> python get_pip.py >> pip install tweepy - if you like you can also track tweepy down and install it manually but I don't see why when pip is so awesome. 3. Then test by opening a terminal window then cd into the directory with the python script and run: ### Replace the generic paths with the path in your configuration. >> python /path/to/twitter_kafka_direct.py 4. to deliver the stream to csv: - replace stubs with values for your tokens in twitterStream.py >> python /path/to/twitterStream.py > twitterData.csv 5. to write data to kafka: - Use twitter_kafka_direct.py. Replace token stubs with your values and state your topic mytopic default is 'topic' >> python /path/to/twitter_kafka_direct.py that will begin to stream data events into a kafka producer. ########################################### *** Note *** this procedure assumes a topic named twitterstream exists in kafka to produce the data to. if you need to create a topic use the following code : >> bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic twitterstream check what topics you have with: >> bin/kafka-topics.sh --list --zookeeper localhost:2181 to check if data is in fact landing in kafka: >> bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic twitterstream --from-beginning ###########################################
About
Python application that allows one to open a connection to a live stream from Twitter to Apache Kafka for use in Demo / POC situations.
Resources
License
Stars
Watchers
Forks
Packages 0
No packages published