Climate Misinformation Detector

This repository contains code for detecting climate misinformation using multiple machine learning and natural language processing (NLP) models.
The project evaluates traditional and transformer-based models on their ability to classify climate-related claims as supported or unsupported, while tracking energy use and carbon emissions for sustainable AI research.

Overview

Climate misinformation poses a major obstacle to informed action on environmental issues.
This project explores how well different machine learning approaches can identify misleading or false climate-related claims.
Models are trained and compared across performance, interpretability, and emissions efficiency.

Implemented Models

Logistic Regression (custom + scikit-learn)
Feedforward Neural Network (MLP) (scikit-learn)
Transformer (custom implementation)
Fine-tuned BERT (bert-base-uncased)

Each model is trained on a dataset of labeled climate claims and evaluated on standard classification metrics.

Results

Model	Accuracy	Recall	Fine Tuning CO₂ Emissions (g)	Training CO₂ Emissions (g)
Logistic Regression (scikit)	0.66	0.67	0.01	3
Neural Network (MLP)	0.68	0.75	0.01	3
Transformer (custom)	0.66	0.74	0.0	0.25
BERT-base Fine-tuned	0.66	0.67	0.01	650,000

Repository Structure

ninaiervin-climate_misinfo_detector/
│
├── BERT.py                       # Fine-tune a pretrained BERT model
├── BERT_eval.py                  # Evaluate fine-tuned BERT
│
├── transformer.py                # Train small Transformer from scratch
│
├── logistic_regression.py        # (Prototype) Custom logistic regression
├── scikit_logistic_regression.py # Logistic regression with sentence embeddings
├── logistic_regression_eval.py   # Evaluate logistic regression model
│
├── scikit_NN.py                  # Feedforward NN using scikit-learn
├── NN_eval.py                    # Evaluate trained NN model
│
├── exploring_data_layout.py      # Data preprocessing and splitting
│
├── log_reg_params_0.6818.joblib  # Saved logistic regression model (best accuracy)
│
└── data/                         # Expected folder for dataset (JSONL files)

Data Format

All scripts assume a dataset of climate claims in .jsonl format:

{
  "claim": "The Earth's climate has warmed over the past century.",
  "claim_label": "SUPPORTS"
}

Example expected files:

data/train_data.jsonl
data/dev_data.jsonl
data/test_data.jsonl

The helper script exploring_data_layout.py manages dataset loading and splitting.

Installation

Requirements

Python 3.8+
pip (Python package manager)

Install Dependencies

pip install torch transformers scikit-learn pandas numpy matplotlib joblib codecarbon

Optional:
To export results or plots, you may also install
pip install seaborn tqdm sentence-transformers

Data Exploration

python exploring_data_layout.py

Logistic Regression

Train:

python scikit_logistic_regression.py

Evaluate:

python logistic_regression_eval.py

Neural Network (MLP)

Train:

python scikit_NN.py -hi 128 -act relu -b 32 -lr 0.001

Evaluate:

python NN_eval.py

Transformer (from scratch)

Train:

python transformer.py --epochs 3 --batch_size 16

BERT Fine-Tuning

Train:

python BERT.py --epochs 3 --batch_size 16 --lr 2e-5 --output_dir ./bert_output

Evaluate:

python BERT_eval.py --output_dir ./bert_output

Sustainability Tracking

Each model’s energy usage and carbon emissions are tracked using codecarbon:

from codecarbon import EmissionsTracker
tracker = EmissionsTracker(project_name="ClimateMisinfo")
tracker.start()
# train model
tracker.stop()

Outputs include:

Energy (kWh)
CO₂ emissions (grams)
Runtime duration (seconds)

Key Features

End-to-end model comparison (from logistic regression to BERT)
Consistent preprocessing and evaluation pipeline
Integrated carbon tracking with codecarbon
Modular, reproducible design for new datasets or models

🧑‍💻 Author

Nina Ervin
M.S. Computer Science, University of California San Diego
LinkedIn

Anuk Centellas
M.S. Computational Linguistics, University of Washington
LinkedIn

📜 License

This project is licensed under the MIT License.
See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Climate Misinformation Detector

Overview

Implemented Models

Results

Repository Structure

Data Format

Installation

Requirements

Install Dependencies

Data Exploration

Logistic Regression

Neural Network (MLP)

Transformer (from scratch)

BERT Fine-Tuning

Sustainability Tracking

Key Features

🧑‍💻 Author

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
data		data
BERT.py		BERT.py
BERT_eval.py		BERT_eval.py
NN_eval.py		NN_eval.py
NN_params_0.6883.joblib		NN_params_0.6883.joblib
README.md		README.md
exploring_data_layout.py		exploring_data_layout.py
log_reg_params_0.6818.joblib		log_reg_params_0.6818.joblib
logistic_regression.py		logistic_regression.py
logistic_regression_eval.py		logistic_regression_eval.py
scikit_NN.py		scikit_NN.py
scikit_logistic_regression.py		scikit_logistic_regression.py
transformer.py		transformer.py

Folders and files

Latest commit

History

Repository files navigation

Climate Misinformation Detector

Overview

Implemented Models

Results

Repository Structure

Data Format

Installation

Requirements

Install Dependencies

Data Exploration

Logistic Regression

Neural Network (MLP)

Transformer (from scratch)

BERT Fine-Tuning

Sustainability Tracking

Key Features

🧑‍💻 Author

📜 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages