Repository containing code for collecting and working with policy actor network data. The contents are currently primarily focused on how to collect and label Twitter accounts of policy organizations and their employees/members.
- Quick start
- Protocol
- Summary of files
* = teams not running scripts can ignore these steps.
- *Clone the repository.
- *Install the required packages (see
compontwitter.yml
). - *Change
config.ini.template
intoconfig.ini
and enter your own settings. - Collect the main accounts (steps 01 and 02).
- *Run Notebook for step 03.
- Label all side accounts (step 04).
- *Merge data and voilà!
Details about data collection and labeling for each of the steps outline below can be found in the TwitterCodebook.pdf
.
You should have a predefined list of policy actors. This protocol does not address how to bound policy systems and identify policy actors.
In this step, you should find the collective main account of each of your policy actors and provide a list of keyword for filtering the side accounts. Document them in the template file (template\main_accounts.csv
) and assign them with level
0. When entering the account names, do not include the @ sign. Make sure this file is encoded in UTF-8.
Keywords are used to limit the number of potentially relevant side accounts returned as a part of step 03. They are strings, at least one of which must be included in an account's Twitter bio for the account to be included as a potential side account to be labeled. String matching is exact (but case-insensitive), so all possible characters, including spaces, are allowed. For example, if one of your keywords is "Hello world", then the bio must contain exactly "hello world" (with any combination of capitalization) to be included. Keywords should be entered into the keywords column of main_accounts.csv
separated by a comma and a space: ", ".
In this step, you should find the individual main accounts of each of your policy actors. Document them in the template file (template\main_accounts.csv
) and assign them with level
1. When entering the account names, do not include the @ sign. Make sure this file is encoded in UTF-8.
This step needs to be done before step 05. Doing it as early as possible after steps 01 and 02 will ensure the most consistent data (so accounts do not change their usernames). Run the Jupyter Notebook 0a_standardize_main.ipynb
. The output file will be used by a later step, but can be ignored for now.
Only do this step after main_accounts.csv
has been filled with the collective main accounts and their keywords (step 01). Run the Jupyter Notebook 03_identify_sides.ipynb
. It will create coding sheets for each of your collective main accounts (as long as they have potential side accounts).
Only do this step after step 03 is complete. Go through each of the check_[org].csv
file and classify the accounts based on instructions in the protocol file. Enter the level of the account in the 4th column and save the file. Collective side accounts are labeled 2, individual side accounts are labeled 3, and unrelated accounts are left blank. If an account should be included as individual main (but was not included in step 02), it is fine to label them in this step as 1.
After all preceding steps are complete, run the Jupyter Notebook 05_levels_merge.ipynb
. This will create an output file all_accounts.csv
, which contains all policy actors' accounts and their levels. We will use this to collect Twitter behavior of our policy actors.
data\
: suggested location for setting up working/scratch spaceexample\
: illustation of the pipelinemain_accounts.csv
: this is the file where main accounts and keywords were manually enteredcheck_cxaalto.csv
andcheck_ecanettutkimus.csv
: files returned by the notebook in step 03. Hand labeled according to step 04.main_standardized.csv
: file used for final merging, returned by the notebook in step 0a.all_accounts.csv
: final product of this data collection exercise. Returned by the notebook in step 05.
scripts\
: main directory where the scripts and notebooks are stored03_identify_sides.ipynb
: notebook for step 03.0a_standardize_mains.ipynb
: notebook for step 0a.05_levels_merge.ipynb
: notebook for step 05.config.ini.template
: remove ".template" and enter Twitter API credentials and data directory (suggested to bedata\
from this repo).
templates\
: contains the template for steps 01 and 02.TwitterCodebook.pdf
: detailed outline of data collection and labeling protocol