We’ve been studying how to scalably answer this question using LLMs and large-scale brain-imaging datasets. Together, these let us automatically generate and test scientific hypotheses about language processing in the brain, potentially enabling a new paradigm for scientific research. This repo contains code for running these analyses.
This repo contains code underlying 2 neuroscience studies:
Generative causal testing to bridge data-driven models and scientific theories in language neuroscience (Antonello*, Singh*, et al., 2024, arXiv)
Generative causal testing (GCT) is a framework for generating concise explanations of language selectivity in the brain from predictive models and then testing those explanations in follow-up experiments using LLM-generated stimuli.
Evaluating scientific theories as predictive models in language neuroscience (Singh*, Antonello*, et al. 2025, bioRxiv)
QA encoding models builds features by annotating a language stimulus with the answers to yes-no questions using an LLM.
This repo also contains code for experiments in 3 ML studies (for a simple scikit-learn interface to use these, see imodelsX):
Aug-imodels: Augmenting interpretable models with large language models during training (Singh et al. 2023, Nature communications)
Aug-imodels is a framework for LLMs to build extremely efficient and interpretable prediction models, e.g. linear ngram models or decision trees. Aug-imodels use LLMs during fitting but not during inference, allowing complete transparency and dramatic speed improvements.
QA-Emb: Crafting interpretable embeddings by asking LLMs questions (Benara*, Singh*, et al. 2024, NeurIPS)
QA-Emb is a more general version of QA encoding models, that generally builds text embeddings by asking LLMs a series of yes/no questions.
SASC: Explaining black box text modules in natural language with language models (Singh*, Hsu*, et al. 2023, NeurIPS workshop)
SASC is a pipeline for generating natural language explanations of black-box text modules using LLMs and synthetic causal testing.
Finally, here are 3 studies that share the codebase here:
Interpretable embeddings of speech enhance and explain brain encoding performance of audio models (Shimizu et al. 2025, arXiv)
Using QA-Encoding models to analyze and improve black-box speech encoding models.
Interpretable next-token prediction via the generalized induction head(Kim*, Mantena*, et al. 2025, NeurIPS)
Hand-engineering an induction head to retrieve features from the context can help improve interpretable fMRI encoding models.
Vector-ICL: In-context learning with continuous vector representations (Zhuang et al. 2025, ICLR)
Can convert fMRI responses to continuous vector representations that can be used with LLMs to do few-shot decoding of QA features.
Dataset
- To quickstart, just download the responses / wordsequences for 3 subjects from the encoding scaling laws paper
- This is all the data you need if you only want to analyze 3 subjects and don't want to make flatmaps
- To run full experiments, go through the paths in
neuro/config.py
and download data to the appropriate locations from this box folder- To download the main dataset here (the HuthLab fMRI passive listening dataset), run
python experiments/00_load_dataset.py
(will download the data using datalad)
- To download the main dataset here (the HuthLab fMRI passive listening dataset), run
- To make flatmaps, need to set pycortex filestore to
{root_dir}/ds003020/derivative/pycortex-db/
- The
data/decoding
folder contains a quickstart easy example for TR-level decoding- It has everything needed, but if you want to visualize the results on a flatmap, you need to download the relevant PCs from here
Code
- Install using uv. Clone the repo,
cd
into the repo, runuv add git+https://github.com/csinva/imodelsX
, then runuv sync
. This will locally install theneuro
package - Useful functions
- Loading responses
neuro.data.response_utils
functionload_response
- Loads responses from at
{neuro.config.root_dir}/ds003020/derivative/preprocessed_data/{subject}
, where they are stored in an h5 file for each story, e.g.wheretheressmoke.h5
- Loading stimulus
ridge_utils.features.stim_utils
functionload_story_wordseqs
- Loads textgrids from
{root_dir}/ds003020/derivative/TextGrids
, where each story has a TextGrid file, e.g.wheretheressmoke.TextGrid
- Uses
{root_dir}/ds003020/derivative/respdict.json
to get the length of each story
- Loading responses
Demo
python experiments/02_fit_encoding.py
- Running this script with no args runs a simple small run. This script takes many relevant arguments through argparse to run different experiments
- Big thanks to folks that released open-source brain-imaging datasets, especially the HuthLab fMRI passive listening dataset and the Podcast ECoG dataset
- See related fMRI experiments
- Built from this template