Skip to content

Streaming real-time tweets and persisting them to elasticsearch. Later, tweets can be filtered using REST API.

Notifications You must be signed in to change notification settings

suyash248/tweety

Repository files navigation

Tweety

Uses "Twitter Streaming API" to get the target tweets(real-time) for a recent high traffic event(s), and persisting them to elasticsearch. Later, tweets can be filtered using REST API

Requirements

Python 2.7+, pip, Elastilcsearch, Twitter developer app

Note: For creating Twitter developer app, visit Twitter Application Management page

How to run?

  1. Move to <project-dir>, create virual environment and then activate it as
$ cd <project-dir>
$ virtualenv .environment
$ source .environment/bin/activate
  1. Copy settings_sample.py and create settings.py. Edit configuration/settings related to Twitter developer app.
$ cp settings_sample.py settings.py
  1. Add project to PYTHONPATH as
$ export PYTHONPATH="$PYTHONPATH:." # . corresponds to current directory(project-dir)

If you are using PyCharm then it can be done under run configuration.

  1. Under <project-dir> install requirements/dependencies as
$ pip install -r requirements.txt
  1. Then run app.py as
$ python app.py

Now you can access the application by visiting {protocol}://{host}:{port}. For localhost it is http://localhost:5000.

Congratulations! Start Streaming & later on data can be filtered by using Funneling API.

Schema

Fields: In Elasticsearch, every document tweet under tweets_index will contain following fields -

  • tweet_text: string,

  • screen_name : string,

  • user_name: string,

  • location: string,

  • source_device: string,

  • is_retweeted: boolean,

  • retweet_count: integer,

  • country: string,

  • country_code: string,

  • reply_count: integer,

  • favorite_count: integer,

  • created_at: datetime,

  • timestamp_ms: long,

  • lang: string,

  • hashtags: array

Operators

Operators: Following operators are available in order to filter/query data/tweets -

  • equals : Facilitates exact match, or = operator for numeric/datetime values.

  • contains : Facilitates full-text search.

  • wildcard :

    • startswith : *ind (Starts with ind),

    • endswith : ind* (Ends with ind),

    • wildcard : *ind* (searches ind anywhere in string)

  • gte : >= operator for numeric/datetime values.

  • gt : > operator for numeric/datetime values.

  • lte : <= operator for numeric/datetime values.

  • lt : < operator for numeric/datetime values.

API's/Endpoints

Streaming

GET /stream?keywords=cricket,hockey,virat

It will start streaming real-time tweets containing kewords. And tweets will get persisted in elasticsearch under the index tweets_index and tweet document type.

Response

{
  "status": "success",
  "message": "Started streaming tweets with keywords [u'cricket', u'hockey', u'virat']"
}

Funneling/Searching

POST /funnel?from=0&size=20

Note: from & size can be used for limit/pagination, but are optional, default size is 100.

Request body

{
	"sort":["created_at"],          		// User '-' sign for 'desc' order.
	"criteria": {
		"AND": [{
			"fields": ["created_at"],	
			"operator": "gte",		// equals, contains, wildcard, gte, gt, lte, lt
			"query": "2017-12-17T14:18:13"
		    }, {
			"fields": ["location"],
			"operator": "wildcard",
			"query": "*ind*"
		    }, {
			"fields": ["hashtags"],		// 'hashtags' is an array field.
			"operator": "contains",
			"query": "Cricket"
		    }
		],
		"OR": [{
			"fields": ["hashtags"],
			"operator": "contains",
			"query": "cricket"
		    }, {
			"fields": ["hashtags"],
			"operator": "contains",
			"query": "hockey"
		    }
		],
		"NOT": [{
			"fields": ["source_device"],
			"operator": "equals",
			"query": "Twitter for Android"
		    }
		]
    	}
}

Response

{
    "count": {
        "total": 21,
        "fetched": 10
    },
    "results": [
        {
            "sort": [
                1513520366000
            ],
            "_type": "tweet",
            "_source": {
                "lang": "in",
                "is_retweeted": false,
                "retweet_count": 0,
                "screen_name": "T10CricketLive",
                "country": "",
                "created_at": "2017-12-17T14:19:26",
                "hashtags": [
                    "IndvSL",
                    "Cricket"
                ],
                "tweet_text": "Ind 193/2 (30 ov), need 23. Karthik 15(24), Dhawan 87(79). Bowling figures of Akila Dananjaya so far: 7-0-48-1. #IndvSL #Cricket",
                "source_device": "IFTTT",
                "reply_count": 0,
                "location": "New Delhi, India",
                "country_code": "",
                "timestamp_ms": "1513520366428",
                "user_name": "cricGuru5167",
                "favorite_count": 0
            },
            "_score": null,
            "_index": "tweets_index",
            "_id": "AWBk2AUVU3yhj98vAeu_"
        },
        {......},
        {......},
        {......},
        {......},
    ]
}

About

Streaming real-time tweets and persisting them to elasticsearch. Later, tweets can be filtered using REST API.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages