Large Language Models and the Argument from the Poverty of the Stimulus

Code and data for Large Language Models and the Argument from the Poverty of the Stimulus.

Getting started

Install Python 3.9 and required packages:

pip install -r requirements.txt

Usage examples:

Plotting surprisal curves for one or more models:

import viz

models = ("gpt2", "gpt3")
sentences = (
    "I know who you talked with yesterday",  # Grammatical.
    "I know who you talked with _ Mary _ yesterday",    # Ungrammatical.   
)

labels = ("Grammatical", "Ungrammatical")
viz.plot_surprisals(sentences, models=models, labels=labels)

Result:

Running filler/gap experiments

The grammars for the tasks described in the paper are defined in grammars.toml.

Running run_phenomenon.py runs all sentences generated from the grammar, for all combinations of [±filler, ±gap].

To get surprisal values for all four conditions on specific models, and calculate the interaction values:

python run_phenomenon.py --phenomenon {"ATB", "PG"} --models <comma separated list of models> --names {'tiny', 'small', 'large'}

Supported models: bert-base-uncased, bert-large-uncased, gpt2, gpt3, grnn, childes_transformer_<i>, childes_lstm_<i>.

To run the experiments from the paper, for example for gpt2 and gpt3, run:

python run_phenomenon.py --phenomenon ATB --models gpt2,gpt3 --names tiny
python run_phenomenon.py --phenomenon PG --models gpt2,gpt3 --names small

Results are saved to data, for example:

ATB__model__gpt3__cfg_tiny_v2023-10-30a.{csv,pickle} – surprisal values for each sentence generated by the grammar.
ATB__model__gpt3__cfg_tiny_v2023-10-30a_full_interaction.{csv,pickle} – surprisal interaction values for each tuple of four [±filler, ±gap] sentences.

Paper data

To reproduce the plots and stats reported in the paper, see notebooks in notebooks.
Logs for the Transformer retraining task are in data/logs.zip.
For the full retrained Transformer checkpoints, write to nur.lan@ens.psl.eu.

Raw surprisal values

All the sentences and raw surprisal values for the experiments reported in the paper are found under data/model_surprisals.zip.

To prevent test set contamination by large language models who train on crawled data and then test on it, all dataset files are zipped and password-protected. The password to all zip files is llm_pos.

See Jacovi et al., 2022 – Practical Strategies for Mitigating Data Contamination by Evaluation Benchmarks.

Model setup

To use GPT-3, add your OpenAI API key as an environment variable:

export OPENAI_KEY="..."

`lm-zoo` models

The models GPT-2, GRNN, and JRNN are run using Docker based on the lm-zoo project.

Pull the following images to be able to get inferences from GPT2, GRNN, JRNN:

cpllab/language-models:gpt2
cpllab/language-models:jrnn
cpllab/language-models:grnn

! JRNN is extremely slow and takes 1-2 minutes per one sentence inference.

CHILDES models

Run ./get_childes_models.sh to download the CHILDES LSTM/Transformer models from Yedetore et al. (2023).

Running inference on CHILDES models:

Clone original repo:

git clone https://github.com/adityayedetore/lm-povstim-with-childes/

And add its location to PYTHONPATH:

export PYTHONPATH="$PYTHONPATH:/PATH/TO/lm-povstim-with-childes/src"

Retraining task

Getting the original English Wikipedia training data:

WIKI_DATA_DIR="./data/retraining/wikipedia"
mkdir -p $WIKI_DATA_DIR

wget -x https://dl.fbaipublicfiles.com/colorless-green-rnns/training-data/English/train.txt -O $WIKI_DATA_DIR/train.txt.original
wget -x https://dl.fbaipublicfiles.com/colorless-green-rnns/training-data/English/test.txt -O $WIKI_DATA_DIR/test.txt.original
wget -x https://dl.fbaipublicfiles.com/colorless-green-rnns/training-data/English/valid.txt -O $WIKI_DATA_DIR/valid.txt.original
wget -x https://dl.fbaipublicfiles.com/colorless-green-rnns/training-data/English/vocab.txt -O $WIKI_DATA_DIR/vocab.txt

Getting the Yedetore CHILDES training data:

Note: the vocab.txt file provided in pretraining.zip is incorrect. The correct file is at http://www.adityayedetore.com/childes-project/vocab.txt.

CHILDES_DATA_DIR="./data/retraining/childes"
mkdir -p $CHILDES_DATA_DIR
cd /tmp/

wget http://www.adityayedetore.com/childes-project/pretraining.zip
unzip pretraining.zip
cp pretraining/train.txt $CHILDES_DATA_DIR/train.txt.original
cp pretraining/valid.txt $CHILDES_DATA_DIR/valid.txt.original
cp pretraining/test.txt $CHILDES_DATA_DIR/test.txt.original

cd $CHILDES_DATA_DIR
wget http://www.adityayedetore.com/childes-project/vocab.txt # Correct file.

Running retraining for ATB/PG:

PYTHONPATH="." python retraining.py --phenomenon ATB --names tiny --samples_per_condition 100 --seed 1000 --context_size 30 --dataset_name "wikipedia"
PYTHONPATH="." python retraining.py --phenomenon PG --names small --samples_per_condition 100 --seed 1000 --context_size 30 --dataset_name "wikipedia"

Caching surprisal values

Inference is costly, so we cache surprisal values to Redis. If you're running a large scale experiment like the ones in the paper, it's better to run redis before: $ redis-server &.

Modified external sources

The project uses modified versions of the following projects, under modified_external_sources:

`lm-povstim-with-childes`

Source code for Yedetore et al. (2023).

Changes made

Changed main.py to support training on original dataset + extra sentences.

`colorlessgreenRNNs`

Source code for Gulordava et al..

Changes made

Changed dictionary_corpus.py to support multiple text file sources.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Large Language Models and the Argument from the Poverty of the Stimulus

Getting started

Usage examples:

Running filler/gap experiments

Paper data

Raw surprisal values

Model setup

`lm-zoo` models

CHILDES models

Retraining task

Getting the original English Wikipedia training data:

Getting the Yedetore CHILDES training data:

Running retraining for ATB/PG:

Caching surprisal values

Modified external sources

`lm-povstim-with-childes`

Changes made

`colorlessgreenRNNs`

Changes made

Files

README.md

Latest commit

History

README.md

File metadata and controls

Large Language Models and the Argument from the Poverty of the Stimulus

Getting started

Usage examples:

Running filler/gap experiments

Paper data

Raw surprisal values

Model setup

lm-zoo models

CHILDES models

Retraining task

Getting the original English Wikipedia training data:

Getting the Yedetore CHILDES training data:

Running retraining for ATB/PG:

Caching surprisal values

Modified external sources

lm-povstim-with-childes

Changes made

colorlessgreenRNNs

Changes made

`lm-zoo` models

`lm-povstim-with-childes`

`colorlessgreenRNNs`