Skip to content

Python script that retrieves Twitter data via the Twitter API for further analysis

License

Notifications You must be signed in to change notification settings

erickaakcire/ornithologist

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 

Repository files navigation

ornithologist

Ornithologist is a python script that retrieves Twitter data via the Twitter Search API for further analysis. I developed Ornithologist to provide students in my courses with a way to gather their own Twitter data to learn social media analysis.

Requires Python-Twitter https://github.com/bear/python-twitter and obtaining / filling in your own Twitter API credentials and directory for data output in a config file (example config file included).

For available options and usage run the program and a help message with available commands will appear.

Ornithologist produces tab-delimited UTF-8 files with the resulting data and a directory with each tweet as a separate text file (often needed for natural language processing analysis):

ornithologistLog.csv - Header row and one of data per search indicating the variables filled in and the time of the search (UTC)

tweets.csv - General purpose file with tweet-level data. See Twitter API documentation on status objects.

users.csv - General purpose file with user-level data. See Twitter API documentation on user objects.

links.csv - Links found in tweets, use option --unshorten to get the destination link.

hashtags.csv - tweet ID and hashtag used in the tweet - one line per hashtag

mentions.csv - tweet ID and user name of the user mentioned in the tweet - one line per user

user2hashtagEdges.csv - Two mode network data (user = Source, hashtag= Target) ready to import to Gephi or other network analysis programs.

rtEdges.csv - Two mode network data (user = Source, retweeted user= Target).

userEdges.csv - Two mode network data (user = Source, user addressed= Target).

tweets/ - Directory with the text of each tweet as a separate text file, named as Tweet ID.csv. Suitable for NLP analysis applications.

Known Issues: Check files for duplicates.

  • Files are tab separated, csv extension used just to make it easier to find the files in some programs.

About

Python script that retrieves Twitter data via the Twitter API for further analysis

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages