This repository contains the data and implementation for a human-centered evaluation and auditing framework designed to assess the capability of Large Language Models (LLMs) in automating local journalism tasks. Specifically, we investigate the model's ability to identify newsworthy topics from municipal proceedings and generate professional-grade headlines.
Traditional local journalism is facing a resource crisis. This project evaluates whether LLMs can augment newsrooms by:
- Extracting salient information from city council agendas and meeting transcripts.
- Generating news headlines that meet professional journalistic standards.
- Prioritizing topics based on local impact and newsworthiness.
Our pipeline includes a comparative study where LLM outputs are audited against professional standards via crowd-sourced human evaluation.
.
├── ___input/ # Primary raw data sources
│ ├── agenda_url/ # Source URLs for city council agendas
│ ├── audio/ # Meeting recordings (.wav) for transcription
│ └── evaluations/ # Raw human evaluation responses from Prolific
│
├── __src/ # Modular Research Pipeline (Jupyter Notebooks)
│ ├── mod1_agenda_proc/ # Scraping and cleaning of PDF/text agendas
│ ├── mod2_trans_proc/ # Speech-to-text and segmenting meeting audio
│ ├── mod3_llm_gen/ # Prompt engineering for headline generation
│ ├── mod4_ranking/ # Processing model and human ranking data
│ └── mod5_llm_auditing/ # Statistical analysis and metric calculation
│
├── _interim/ # Processed data at each pipeline stage
│ └── ... # (See notebooks for specific step outputs)
│
└── results/ # Final artifacts for the paper
├── average_rank.pdf # Plot of average true rank of LLM and expert selected topics
├── headline_rank_diff.pdf # Plot of average rank difference between LLM-generated and expert-written headlines for the same topic
└── recall_rate.pdf # Plot of top-3/5 topic recall rate from LLM and expert selected topics
To reproduce the results presented in the paper, please follow the modular pipeline in numerical order. Each notebook contains specific local directions, data requirements, and configuration steps necessary for that phase of the audit.
- Navigate to the
__src/directory. - Execute the modules in order (
mod1throughmod5). - Follow the specific directions provided at each step within the notebooks to ensure data persistence and correct file pathing.
This project uses Python 3.11.5+. Dependencies can be installed via:
pip install -r requirements.txtThe whisper model used in mod2 requires ffmpeg to be installed on your system's operating system (it cannot be installed via pip).
- On macOS:
brew install ffmpeg
- On Ubuntu/Linux:
sudo apt update && sudo apt install ffmpeg
- API Keys: Access to OpenAI, Anthropic, and Google Gemini APIs is required. These should be stored in a local
.envfile, under the namesCLAUDE_KEY,GEMINI_KEY, andOPENAI_KEYrespectively. - Data: Ensure the
___input/directory is populated with the necessary raw files as described in the directory structure.
This project is licensed under the MIT License - see the LICENSE file for details.