README.Rmd

---
output: github_document
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```

# Examining Repressive and Oppressive State Violence using the Ill-Treatement Contents and Torture Data

This repo contains replication materials for:

Beger, Andreas and Daniel W. Hill, Jr., 2019, "Examining Repressive and Oppressive State Violence using the Ill-Treatement Contents and Torture Data", _Conflict Managament and Peace Science_. 

The journal article link is at https://doi.org/10.1177/0738894219882352, and a [pre-print PDF](https://github.com/andybega/cmps-itt/blob/master/preprint.pdf) is also included in this repo.  

```bibtex
@article{beger2019examining,
  author = {Andreas Beger and Daniel W.\ Hill, Jr.\},
  title = {Examining Repressive and Oppressive State Violence using the Ill-Treatement Contents and Torture Data},
  year = 2019,
  journal = {Conflict Management and Peace Science},
  volume = {},
  number = {},
  doi = {https://doi.org/10.1177/0738894219882352}
}
```

(Note to my future self: the private [isa-2018](https://github.com/andybega/isa-2018) repo contains the original source material, including the tagged ISA 2018 version.)

## Setup

To see and installed the R packages needed for replication, see the `setup.R` script. It preferentially uses the **checkpoint** packages to install the required packages using the 2019-08-15 CRAN snapshot. This ensures the package versions match those we used when we last updated the results.

It will also check for and if needed create output directories according to the structure shown below, although if you got this code from GitHub they should already be in place. 

### Working directory

All scripts are setup with the assumption that the working directory is set to the `cmps-itt` folder. 

## Code

The R scripts in the `R/` folder replicate the figures and tables in the main paper and supplemental appendix. The `R/functions.R` file contains helper functions used in some of the other scripts, otherwise all the other files can be run in the order they sort alphabetically. 

Files that start with `si`...pertain to the SI. All other files are related to the main article. 

All output generated by the scripts will be saved in the `output/` folder and sub-folders. It should have the following structure:

```
- output/
  - figures/
  - figures-robustness/
  - models/
  - tables/
```

Training the XGBoost model (`2-xgboost.R`) and the SI expanded model set (`si1-estimate-all-models.R`) takes a while. We have included the trained XGBoost model in `output/mdl-xgboost-orig.rds` with `output/xgboost-fit-orig.csv` and thus it is possible to replicate the main results without re-training it. (Note that if you do re-run `2-xgboost.R`, you need to change references to the "-orig" files to versions without those suffixes in some of the scripts.) 

There are 1,008 models in the SI. These are not included, but estimating them does not take quite as long as training the XGBoost model. 

### Script runtimes 

Some of the scripts take longer to run. One a 2016 MacBook Pro:

- `1-estimate-core-models.R`: about 10 minutes, mostly for the cross-validation
- `xgboost.R`:
- `si1-estimate-all-models.R`: about 40 minutes

### Supplemental Information / Appdendix Rmarkdown report

The SI was created using a R Markdown report that contains some embedded code. See the contents of the `si-text/` folder. The report depends on output generated by the `R/si...` scripts. 

If you are not familiar with R Markdown, the report can be converted to a PDF using this code:

```r
# if needed: install.packages("rmarkdown") 
library("rmarkdown")
rmarkdown::render("si-text/beg_hil_SI.Rmd")
```

## Data

The data is included in the `data` directory both in R's native RDS format and in CSV form. 

Most variables have prefixes indicating the data source:

- `itt_`: Ill-Treatment and Torture data; also binary indicators starting with `yy_` and used in an earlier version
- `NY.GDP.MKTP.KD` and subequent, including `pop`: World Bank World Development Indicators
- `v2x_`: V-Dem
- `regime` and `dd_democracy` from Cheibub, Gandhi, and Vreeland Democracy and dictatorships data
- `epr_`: Ethnic Power Relations 
- variables from `internal_confl` to `ext_conf_minor`: UCDP ACD
- `gtd_`: Global terrorism database
- `ccp_`: Comparative Constitutions Project
- `gmfd_functionallyfree`: media freedom; see SI
- `igo_`: COW IGO membership dataset; see SI
- `NE.TRD.GNFS.ZS`: trade as % of GDP; see SI
- Human rights organization-related (`hro_`); see SI
- Time trends (`year_`); see SI

```{r}
cy <- readRDS("data/cy.rds")
str(cy)
```


## R session info

```{r}
sessionInfo()
```