Twitter Academic API Client
In development. Expect breaking changes and bugs when updating to the latest version.
Tested on Linux (Ubuntu 20.10, Python 3.8) and MacOS 11 (Python 3.9). Please raise an issue if you need to install it with another Python version or encounter issues with other operating systems.
It is/will be more of a Twitter API client convenience wrapper that automates common tasks (e.g. get all tweets by a list of users and poll for new tweets regularly or get all tweets about an ongoing event based on keywords). That means, it actually makes use of existing API clients.
Consider installlation via pipx if you just want to use twacapic as a command line tool:
- If you like pipx, install pipx
- run
pipx install twacapic
Or, simply install via pip:
pip install twacapic
or pip3 install twacapic
usage: twacapic [-h] [-u [USERLIST ...]] [-g GROUPNAME [GROUPNAME ...]] [-c GROUP_CONFIG] [-l LOG_LEVEL] [-lf LOG_FILE] [-s SCHEDULE] [-n NOTIFY] [-a] [-d DAYS] [-v]
optional arguments:
-h, --help show this help message and exit
-u [USERLIST ...], --userlist [USERLIST ...]
Path(s) to list(s) of user IDs, (format: one ID per line). Required for first run only. Same number and corresponding order required as in --groupname argument. Can be used to add users to a group.
-g GROUPNAME [GROUPNAME ...], --groupname GROUPNAME [GROUPNAME ...]
Name(s) of the group(s) to collect. Results will be saved in folder `results/GROUPNAME/`. Can be used to poll for new tweets of a group. Default: "users"
-c GROUP_CONFIG, --group_config GROUP_CONFIG
Path to a custom group config file to define tweet data to be retrieved, e.g. retweets, mentioned users, attachments. A template named `group_config.yaml` can be found in any already created group folder.
-l LOG_LEVEL, --log_level LOG_LEVEL
Level of output detail (DEBUG, INFO, WARNING, ERROR). Warnings and Errors are always logged in respective log-files `errors.log` and `warnings.log`. Default: ERROR
-lf LOG_FILE, --log_file LOG_FILE
Path to logfile. Defaults to standard output.
-s SCHEDULE, --schedule SCHEDULE
If given, repeat every SCHEDULE minutes.
-n NOTIFY, --notify NOTIFY
If given, notify email address in case of unexpected errors. Needs further setup. See README.
-a, --get_all_the_tweets
Get all available tweets (max. 3200) for a user on the first run. Constrain with the --d option to last x days.
-d DAYS, --days DAYS Use only together with -a. Only get tweets posted in the last DAYS days.
-v, --version Print version of twacapic.
At the moment twacapic can collect up to the latest 3200 tweets from an earliest date on of a list of users and then poll for new tweets afterwards if called again with the same group name (without the -a or -d tags!) or if the -s
argument is given.
Email notifications with the -n
argument use yagmail and necessitate a file named gmail_creds.yaml
in the working directory in the following format:
gmail_user: a_gmail_user_name
gmail_password: an_app_password_for_this_user_name
As this is inherently insecure, we recommend to create a new Gmail account that is used for this purpose only, until we have the time to implement a more secure solution.
At first use, it will prompt you for your API credentials, which you find here. These credentials will be stored in a file in the working directory, so make sure that the directory is readable by you and authorised users only.
For non-interactive use, e.g. when automatically deploying twacapic to a server, this file can be used as a template and should always be placed in the working directory of twacapic.
twacapic -g USER_GROUP_NAME -u PATH_TO_USER_CSV
USER_GROUP_NAME
should be the name of the results folder that is meant to be created and will contain raw json responses from Twitter.
PATH_TO_USER_CSV
should be a path to a list of Twitter user IDs, without header, one line per user ID.
Afterwards you can poll for new tweets of a user group by running simply:
twacapic -g USER_GROUP_NAME
Enjoy!
The group config is a yaml file in the following form:
fields:
attachments: No
author_id: Yes
context_annotations: No
conversation_id: No
created_at: No
entities: No
geo: No
in_reply_to_user_id: No
lang: No
non_public_metrics: No
organic_metrics: No
possibly_sensitive: No
promoted_metrics: No
public_metrics: No
referenced_tweets: No
reply_settings: No
source: No
withheld: No
expansions:
author_id: Yes
referenced_tweets.id: No
in_reply_to_user_id: No
attachments.media_keys: No
attachments.poll_ids: No
geo.place_id: No
entities.mentions.username: No
referenced_tweets.id.author_id: No
user.fields:
created_at: No
description: No
entities: No
id: Yes
location: No
name: No
pinned_tweet_id: No
profile_image_url: No
protected: No
public_metrics: No
url: No
username: No
verified: No
withheld: No
An explanation of the fields and expansions can be found in Twitter's API docs:
If your system can run cronjobs, stop twacapic, run crontab -e
and add the following to your crontab:
*/15 * * * * cd PATH/TO/YOUR/TWACAPIC/WORKING/DIRECTORY && flock -n lock.file twacapic [YOUR ARGUMENTS HERE]
This will check every 15 minutes whether twacapic is running (via the lock file), and if not, start it with your arguments.
- Install poetry
- Clone repository
- In the directory run
poetry install
- Run
poetry shell
to start development virtualenv - Run
twacapic
to enter API keys. Ignore the IndexError. - Run
pytest
to run all tests