Uses "Twitter Streaming API" to get the target tweets(real-time) for a recent high traffic event(s), and persisting them to elasticsearch. Later, tweets can be filtered using REST API
Python 2.7+, pip, Elastilcsearch, Twitter developer app
Note: For creating
Twitter developer app
, visit Twitter Application Management page
- Move to
<project-dir>
, create virual environment and then activate it as
$ cd <project-dir>
$ virtualenv .environment
$ source .environment/bin/activate
- Copy
settings_sample.py
and createsettings.py
. Edit configuration/settings related toTwitter developer app
.
$ cp settings_sample.py settings.py
- Add project to
PYTHONPATH
as
$ export PYTHONPATH="$PYTHONPATH:." # . corresponds to current directory(project-dir)
If you are using PyCharm then it can be done under
run configuration
.
- Under
<project-dir>
install requirements/dependencies as
$ pip install -r requirements.txt
- Then run
app.py
as
$ python app.py
Now you can access the application by visiting
{protocol}://{host}:{port}
. For localhost it ishttp://localhost:5000
.
Congratulations! Start Streaming & later on data can be filtered by using Funneling API.
Fields: In Elasticsearch, every document tweet
under tweets_index
will contain following fields -
-
tweet_text
: string, -
screen_name
: string, -
user_name
: string, -
location
: string, -
source_device
: string, -
is_retweeted
: boolean, -
retweet_count
: integer, -
country
: string, -
country_code
: string, -
reply_count
: integer, -
favorite_count
: integer, -
created_at
: datetime, -
timestamp_ms
: long, -
lang
: string, -
hashtags
: array
Operators: Following operators are available in order to filter/query data/tweets -
-
equals
: Facilitates exact match, or = operator for numeric/datetime values. -
contains
: Facilitates full-text search. -
wildcard
:-
startswith
: *ind (Starts with ind), -
endswith
: ind* (Ends with ind), -
wildcard
: *ind* (searches ind anywhere in string)
-
-
gte
: >= operator for numeric/datetime values. -
gt
: > operator for numeric/datetime values. -
lte
: <= operator for numeric/datetime values. -
lt
: < operator for numeric/datetime values.
GET /stream?keywords=cricket,hockey,virat
It will start streaming real-time tweets containing kewords
. And tweets will get persisted in elasticsearch under
the index tweets_index
and tweet
document type.
Response
{
"status": "success",
"message": "Started streaming tweets with keywords [u'cricket', u'hockey', u'virat']"
}
POST /funnel?from=0&size=20
Note:
from
&size
can be used for limit/pagination, but are optional, defaultsize
is 100.
Request body
{
"sort":["created_at"], // User '-' sign for 'desc' order.
"criteria": {
"AND": [{
"fields": ["created_at"],
"operator": "gte", // equals, contains, wildcard, gte, gt, lte, lt
"query": "2017-12-17T14:18:13"
}, {
"fields": ["location"],
"operator": "wildcard",
"query": "*ind*"
}, {
"fields": ["hashtags"], // 'hashtags' is an array field.
"operator": "contains",
"query": "Cricket"
}
],
"OR": [{
"fields": ["hashtags"],
"operator": "contains",
"query": "cricket"
}, {
"fields": ["hashtags"],
"operator": "contains",
"query": "hockey"
}
],
"NOT": [{
"fields": ["source_device"],
"operator": "equals",
"query": "Twitter for Android"
}
]
}
}
Response
{
"count": {
"total": 21,
"fetched": 10
},
"results": [
{
"sort": [
1513520366000
],
"_type": "tweet",
"_source": {
"lang": "in",
"is_retweeted": false,
"retweet_count": 0,
"screen_name": "T10CricketLive",
"country": "",
"created_at": "2017-12-17T14:19:26",
"hashtags": [
"IndvSL",
"Cricket"
],
"tweet_text": "Ind 193/2 (30 ov), need 23. Karthik 15(24), Dhawan 87(79). Bowling figures of Akila Dananjaya so far: 7-0-48-1. #IndvSL #Cricket",
"source_device": "IFTTT",
"reply_count": 0,
"location": "New Delhi, India",
"country_code": "",
"timestamp_ms": "1513520366428",
"user_name": "cricGuru5167",
"favorite_count": 0
},
"_score": null,
"_index": "tweets_index",
"_id": "AWBk2AUVU3yhj98vAeu_"
},
{......},
{......},
{......},
{......},
]
}