Skip to content

Leibniz-HBI/twacapic

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

twacapic

Twitter Academic API Client

In development. Expect breaking changes and bugs when updating to the latest version.

Tested on Linux (Ubuntu 20.10, Python 3.8) and MacOS 11 (Python 3.9). Please raise an issue if you need to install it with another Python version or encounter issues with other operating systems.

Why another Twitter API client?

It is/will be more of a Twitter API client convenience wrapper that automates common tasks (e.g. get all tweets by a list of users and poll for new tweets regularly or get all tweets about an ongoing event based on keywords). That means, it actually makes use of existing API clients.

Installation

Consider installlation via pipx if you just want to use twacapic as a command line tool:

  1. If you like pipx, install pipx
  2. run pipx install twacapic

Or, simply install via pip:

pip install twacapic or pip3 install twacapic

Usage

usage: twacapic [-h] [-u [USERLIST ...]] [-g GROUPNAME [GROUPNAME ...]] [-c GROUP_CONFIG] [-l LOG_LEVEL] [-lf LOG_FILE] [-s SCHEDULE] [-n NOTIFY] [-a] [-d DAYS] [-v]

optional arguments:
  -h, --help            show this help message and exit
  -u [USERLIST ...], --userlist [USERLIST ...]
                        Path(s) to list(s) of user IDs, (format: one ID per line). Required for first run only. Same number and corresponding order required as in --groupname argument. Can be used to add users to a group.
  -g GROUPNAME [GROUPNAME ...], --groupname GROUPNAME [GROUPNAME ...]
                        Name(s) of the group(s) to collect. Results will be saved in folder `results/GROUPNAME/`. Can be used to poll for new tweets of a group. Default: "users"
  -c GROUP_CONFIG, --group_config GROUP_CONFIG
                        Path to a custom group config file to define tweet data to be retrieved, e.g. retweets, mentioned users, attachments. A template named `group_config.yaml` can be found in any already created group folder.
  -l LOG_LEVEL, --log_level LOG_LEVEL
                        Level of output detail (DEBUG, INFO, WARNING, ERROR). Warnings and Errors are always logged in respective log-files `errors.log` and `warnings.log`. Default: ERROR
  -lf LOG_FILE, --log_file LOG_FILE
                        Path to logfile. Defaults to standard output.
  -s SCHEDULE, --schedule SCHEDULE
                        If given, repeat every SCHEDULE minutes.
  -n NOTIFY, --notify NOTIFY
                        If given, notify email address in case of unexpected errors. Needs further setup. See README.
  -a, --get_all_the_tweets
                        Get all available tweets (max. 3200) for a user on the first run. Constrain with the --d option to last x days.
  -d DAYS, --days DAYS  Use only together with -a. Only get tweets posted in the last DAYS days.
  -v, --version         Print version of twacapic.

At the moment twacapic can collect up to the latest 3200 tweets from an earliest date on of a list of users and then poll for new tweets afterwards if called again with the same group name (without the -a or -d tags!) or if the -s argument is given.

Email notifications with the -n argument use yagmail and necessitate a file named gmail_creds.yaml in the working directory in the following format:

gmail_user: a_gmail_user_name
gmail_password: an_app_password_for_this_user_name

As this is inherently insecure, we recommend to create a new Gmail account that is used for this purpose only, until we have the time to implement a more secure solution.

Authorisation with the Twitter API

At first use, it will prompt you for your API credentials, which you find here. These credentials will be stored in a file in the working directory, so make sure that the directory is readable by you and authorised users only.

For non-interactive use, e.g. when automatically deploying twacapic to a server, this file can be used as a template and should always be placed in the working directory of twacapic.

Example

twacapic -g USER_GROUP_NAME -u PATH_TO_USER_CSV

USER_GROUP_NAME should be the name of the results folder that is meant to be created and will contain raw json responses from Twitter.

PATH_TO_USER_CSV should be a path to a list of Twitter user IDs, without header, one line per user ID.

Afterwards you can poll for new tweets of a user group by running simply:

twacapic -g USER_GROUP_NAME

Enjoy!

Config Template

The group config is a yaml file in the following form:

fields:
  attachments: No
  author_id: Yes
  context_annotations: No
  conversation_id: No
  created_at: No
  entities: No
  geo: No
  in_reply_to_user_id: No
  lang: No
  non_public_metrics: No
  organic_metrics: No
  possibly_sensitive: No
  promoted_metrics: No
  public_metrics: No
  referenced_tweets: No
  reply_settings: No
  source: No
  withheld: No
expansions:
  author_id: Yes
  referenced_tweets.id: No
  in_reply_to_user_id: No
  attachments.media_keys: No
  attachments.poll_ids: No
  geo.place_id: No
  entities.mentions.username: No
  referenced_tweets.id.author_id: No
user.fields:
  created_at: No
  description: No
  entities: No
  id: Yes
  location: No
  name: No
  pinned_tweet_id: No
  profile_image_url: No
  protected: No
  public_metrics: No
  url: No
  username: No
  verified: No
  withheld: No

An explanation of the fields and expansions can be found in Twitter's API docs:

Ensure that twacapic is continuously running, even after restart

If your system can run cronjobs, stop twacapic, run crontab -e and add the following to your crontab:

*/15 * * * *    cd PATH/TO/YOUR/TWACAPIC/WORKING/DIRECTORY && flock -n lock.file twacapic [YOUR ARGUMENTS HERE]

This will check every 15 minutes whether twacapic is running (via the lock file), and if not, start it with your arguments.

Dev Install

  1. Install poetry
  2. Clone repository
  3. In the directory run poetry install
  4. Run poetry shell to start development virtualenv
  5. Run twacapic to enter API keys. Ignore the IndexError.
  6. Run pytest to run all tests