This repository contains code for detecting climate misinformation using multiple machine learning and natural language processing (NLP) models.
The project evaluates traditional and transformer-based models on their ability to classify climate-related claims as supported or unsupported, while tracking energy use and carbon emissions for sustainable AI research.
Climate misinformation poses a major obstacle to informed action on environmental issues.
This project explores how well different machine learning approaches can identify misleading or false climate-related claims.
Models are trained and compared across performance, interpretability, and emissions efficiency.
- Logistic Regression (custom + scikit-learn)
- Feedforward Neural Network (MLP) (scikit-learn)
- Transformer (custom implementation)
- Fine-tuned BERT (
bert-base-uncased)
Each model is trained on a dataset of labeled climate claims and evaluated on standard classification metrics.
| Model | Accuracy | Recall | Fine Tuning CO₂ Emissions (g) | Training CO₂ Emissions (g) |
|---|---|---|---|---|
| Logistic Regression (scikit) | 0.66 | 0.67 | 0.01 | 3 |
| Neural Network (MLP) | 0.68 | 0.75 | 0.01 | 3 |
| Transformer (custom) | 0.66 | 0.74 | 0.0 | 0.25 |
| BERT-base Fine-tuned | 0.66 | 0.67 | 0.01 | 650,000 |
ninaiervin-climate_misinfo_detector/
│
├── BERT.py # Fine-tune a pretrained BERT model
├── BERT_eval.py # Evaluate fine-tuned BERT
│
├── transformer.py # Train small Transformer from scratch
│
├── logistic_regression.py # (Prototype) Custom logistic regression
├── scikit_logistic_regression.py # Logistic regression with sentence embeddings
├── logistic_regression_eval.py # Evaluate logistic regression model
│
├── scikit_NN.py # Feedforward NN using scikit-learn
├── NN_eval.py # Evaluate trained NN model
│
├── exploring_data_layout.py # Data preprocessing and splitting
│
├── log_reg_params_0.6818.joblib # Saved logistic regression model (best accuracy)
│
└── data/ # Expected folder for dataset (JSONL files)
All scripts assume a dataset of climate claims in .jsonl format:
{
"claim": "The Earth's climate has warmed over the past century.",
"claim_label": "SUPPORTS"
}Example expected files:
data/train_data.jsonl
data/dev_data.jsonl
data/test_data.jsonl
The helper script exploring_data_layout.py manages dataset loading and splitting.
- Python 3.8+
pip(Python package manager)
pip install torch transformers scikit-learn pandas numpy matplotlib joblib codecarbonOptional:
To export results or plots, you may also install
pip install seaborn tqdm sentence-transformers
python exploring_data_layout.pyTrain:
python scikit_logistic_regression.pyEvaluate:
python logistic_regression_eval.pyTrain:
python scikit_NN.py -hi 128 -act relu -b 32 -lr 0.001Evaluate:
python NN_eval.pyTrain:
python transformer.py --epochs 3 --batch_size 16Train:
python BERT.py --epochs 3 --batch_size 16 --lr 2e-5 --output_dir ./bert_outputEvaluate:
python BERT_eval.py --output_dir ./bert_outputEach model’s energy usage and carbon emissions are tracked using codecarbon:
from codecarbon import EmissionsTracker
tracker = EmissionsTracker(project_name="ClimateMisinfo")
tracker.start()
# train model
tracker.stop()Outputs include:
- Energy (kWh)
- CO₂ emissions (grams)
- Runtime duration (seconds)
- End-to-end model comparison (from logistic regression to BERT)
- Consistent preprocessing and evaluation pipeline
- Integrated carbon tracking with
codecarbon - Modular, reproducible design for new datasets or models
Nina Ervin
M.S. Computer Science, University of California San Diego
LinkedIn
Anuk Centellas
M.S. Computational Linguistics, University of Washington
LinkedIn
This project is licensed under the MIT License.
See the LICENSE file for details.