Bayesian analysis of best-worst word association data

Preprocessing and analysis scripts for Bayesian analysis of best-worst scaling data, such as that generated by our word_norms_survey. The model in this repository allows for regression on the latent log-odds scale for the item values, which can answer questions such as:

Which properties of my words influence their association with evilness as measured by repeated best-worst rankings?

How well does my language model predict word associations on femininity as measured by repeated best-worst rankings?


This repository uses an RStudio project. Open the bestworst_analysis.Rproj file in RStudio to open the project. To run the code in this repository, first install the dependencies as follows in R

# install packages
pks <- c("cmdstanr", "tidyverse", "patchwork", "arrow")
install.packages(pks, repos = c("", getOption("repos")))

# install stan to compile & run models


The model is a Bayesian rank-ordered logit (ROL) model which estimates latent item values based on (partial) rankings of these items on a specific task. The model is implemented in stan. The stan code, data preparation functions, and posterior summarization functions can be found in the stan/ subfolder.

In these models, the likelihood of observing a rank ordering $y$ of $N$ items given each item's latent "worth" parameter $\theta_n$ is:

$$ P(y | \theta) = \sum^N_{n=1}\left[ \exp \theta_n \div \sum^N_{m = n} \exp \theta_m \right] $$

To learn more about these types of rank-ordered logit models, read:

  • For an intuitive understanding, the introduction from the Plackett-Luce package

    Turner, H.L., van Etten, J., Firth, D. and Kosmidis, I. (2020). Modelling Rankings in R: The PlackettLuce Package Computational Statistics, 35, 1027-1057. URL

  • For how this maps to best-worst experiments, Case 1 & the section on Models of Ranking by Repeated Best and/or Worst Choice from Marley, Flynn, & Australia (2015)

    Marley, A. A., Flynn, T. N., & Australia, V. (2015). Best worst scaling: theory and practice. International encyclopedia of the social & behavioral sciences, 2(2), 548-552.

  • For the stochastic (Bayesian) implementation: Glickman & Hennessy (2015)

    Glickman, M. E., & Hennessy, J. (2015). A stochastic rank ordered logit model for rating multi-competitor games and sports. Journal of Quantitative Analysis in Sports, 11(3), 131-144.


Experiment data processing

The experiment data processing script (01_experiment_process.R) takes in data from a best-worst scaling experiment (data_raw/experiment_data/) and creates a long-format version of this data which contains the following information:

  • subj_id the (anonymous) identifier of the participant in the study
  • trial the trial number of the participant
  • association the association that was tested (e.g., evilness, femininity)
  • wordtype the type of the words in the trial (first names, company names, non-words)
  • option the option number of the words (1 to 4)
  • word the word belonging to this option in the trial
  • ranking how the word was ranked. 1 is best, 4 is worst, and the remaining (unranked) words are given an equal middle rank (2.5).

In addition, the following inclusion criteria are applied:

  • include only participants who fully passed the attention check (i.e., both best and worst answers correct)
  • remove trials with response time <= 3 seconds
  • remove trials with log-response time >= 4 sd (i.e., approx 27 seconds)

This reduces the total number of trials from 12341 to 10266.

This long-format data is then stored as an rds file in the processed data folder.

Word data preprocessing

The word data processing script 02_word_preprocess.R reads the word data from data_raw/word_data/ and stores it as processed data (an rds file) in the processed data folder.

NB: for testing, the word data preprocessing script also adds a random item-level predictor to this data: languagemodel_prediction_evilness

Estimating item-level associations

The first analysis script 03_estimate_log_worth.R estimates log-worths for each word in a single word-type category on a single association. It produces the following plot of latent worth on a log-odds scale:

Predicting item-level associations using item-level predictors

The second analysis script 04_regress_log_worth.R performs regression for the log-worths using item-level predictors from the word data. Using this approach, it is possible to perform inference for the regression parameters:

# A tibble: 1 × 10
  variable                           mean median    sd   mad     q5   q95  rhat ess_bulk ess_tail
  <chr>                             <dbl>  <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl>    <dbl>    <dbl>
1 languagemodel_prediction_evilness 0.142  0.144 0.272 0.257 -0.301 0.587  1.00    3201.    4100.


