Skip to content

erenbg1/SDG-Report-Summarization-AI

Repository files navigation

SDG Report Summarization AI

📌 Project Overview

This project focuses on automatic summarization of Sustainability (SDG) Reports using advanced NLP models.
The goal is to condense lengthy corporate sustainability documents into concise, informative summaries that highlight key environmental, social, and governance (ESG) insights.

✨ Features

  • Preprocessing pipeline for large and unstructured text data (PDF → cleaned text).
  • Summarization models: Transformer-based architectures (e.g., T5, BART).
  • Evaluation metrics: ROUGE, BLEU for measuring summary quality.
  • Configurable parameters: summary length, model choice, evaluation scope.

📂 Repository Structure

SDG-Report-Summarization-AI/
│
├── data_preprocessing.py   # Text cleaning and preparation
├── summarization.py        # Summarization pipeline (T5/BART)
├── evaluation.py           # Evaluation with ROUGE & BLEU
├── requirements.txt        # Dependencies
└── README.md               # Project documentation

🚀 Getting Started

1. Clone the repository

git clone https://github.com/erenbg1/SDG-Report-Summarization-AI.git
cd SDG-Report-Summarization-AI

2. Install dependencies

pip install -r requirements.txt

3. Run preprocessing

python data_preprocessing.py --input data/raw_report.pdf --output data/cleaned_report.txt

4. Run summarization

python summarization.py --input data/cleaned_report.txt --model t5-small --max_length 300

5. Run evaluation

python evaluation.py --reference data/reference_summary.txt --candidate data/generated_summary.txt

📊 Example Output

Input length: ~20,000 tokens
Generated summary length: ~500 tokens

"The company’s SDG strategy focuses primarily on reducing carbon emissions, improving supply chain transparency, and investing in community education projects…"

📈 Future Work

  • Expand dataset with multi-company SDG reports.
  • Fine-tune domain-specific summarization models.
  • Add abstractive + extractive hybrid approach.
  • Deploy as a web-based summarization tool (Flask/Streamlit).

📝 License

This project is released under the MIT License.

About

Automatic summarization of SDG sustainability reports using NLP (T5, BART). Includes preprocessing, summarization pipeline, and ROUGE/BLEU evaluation.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages