Skip to content

holistic-ai/hai-ucl-hackathon

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 

Repository files navigation


Holistic AI x UCL AI Society Hackathon 2024


About the Hackathon

Hackathon Website: Holistic AI UCL Hackathon

Welcome to the Holistic AI x UCL AI Society Hackathon 2024! This repository contains all the code, datasets, and resources needed to participate. Whether you're a seasoned data scientist or a beginner, we've got you covered.

Winner & Project Showcase

This section highlights the outstanding projects from the top 5 teams in the hackathon. Each project addresses key challenges in AI trustworthiness and innovation.


🏆 1st Place: HERMES AI

GitHub Repository: HERMES AI Implementation

Presentation Slides: HERMES AI Slides

Track: Track 2

Project Description

Language models (LLMs), trained on human-produced data, often inherit and propagate harmful stereotypes. HERMES AI uses Mechanistic Interpretability to mitigate such issues by training a Sparse Autoencoder (SAE) on the Expanded Multi-Grain Stereotype Dataset (EMGSD). The project demonstrates effective manipulation of GPT-2’s text generation, reducing harmful stereotypes and achieving ethical AI outputs.


Team Members:


🥈 2nd Place: 2 Chill Guys

GitHub Repository: 2 Chill Guys Implementation
Presentation Slides: 2 Chill Guys Slides

Tracks: Track 1 and Track 2

Project Description

2 Chill Guys tackled challenges in both tracks with innovative solutions:

Track 1:

  • Utilized Optuna to design and optimize custom pipelines for datasets in the HolisticAI library.
  • Focused on balancing performance and fairness to align with trustworthy AI principles.

Track 2:

  • Implemented DistilBERT combined with sparse autoencoders to investigate stereotype encoding in language models.
  • Prioritized interpretability and bias analysis, providing insights into stereotype representation and effective mitigation strategies.

Team Members:


🥉 3rd Place: ARM

GitHub Repository: ARM Implementation

Presentation Slides: ARM Slides

Tracks: Track 1 and Track 2

Project Description

ARM developed innovative methods for both tracks by focusing on fairness, accuracy, and sustainability in machine learning models:

Track 1:

  1. Custom Training and Pruning for Tree-Based Models:

    • Designed a custom objective function using HolisticAI's metrics library.
    • Incorporated fairness metrics such as disparate impact and equal opportunity into the training process.
  2. Bayesian Optimization:

    • Leveraged Optuna for hyperparameter search to optimize multiple objectives, including accuracy and fairness.
  3. Pre-Pruning with Fairness Penalty:

    • Introduced two custom tree-pruning methods incorporating a fairness penalty.
    • Reduced systemic gender bias while maintaining accuracy and promoting explainability.
    • Enhanced carbon efficiency by training a single model with fairness included from the start, avoiding multiple training runs.

This approach demonstrates how FairML techniques can effectively ensure equity, sustainability, and performance in modern AI systems.


Team Members:


🎖 4th Place: Stery Class

GitHub Repository: Stery Class Implementation

Presentation Slides: Stery Class Slides

Tracks: Track 2

Project Description

Stery Class focused on developing a sustainable, high-performance classification pipeline with a two-stage approach for stereotype classification:

Track 2:

  1. Two-Stage Classification Pipeline:

    • Stage 1: Utilized a LightGBM classifier with TF-IDF vectorization and innovative features like cosine similarity between input text and mean sentence embeddings for stereotype classes.
    • Stage 2: Predicted category (stereotype-related or unrelated) by augmenting Stage 1 features with class probabilities from the first model’s forward pass.
  2. Focus on Sustainability:

    • Prioritized lightweight and efficient inference, ensuring reduced environmental impact without compromising accuracy.
  3. Inspiration from Generative Models:

    • Proposed a generative model framework using Llama 3 8B with LoRA for coarse-to-fine stereotype classification and reasoning. While computational constraints prevented real-time implementation, the pipeline design drew inspiration from this generative model structure.

This efficient yet innovative approach highlights the potential for combining lightweight ML models with ideas from generative AI to achieve interpretability, efficiency, and sustainability.


Team Members:


🎖 5th Place: Holistic Indians

GitHub Repository: Holistic Indians Implementation

Presentation Slides: Holistic Indians Slides

Tracks: Track 1 and Track 2

Project Description

Holistic Indians approached both tracks with a focus on mitigating biases, ensuring privacy, and prioritizing sustainability in stereotype classification tasks:

Track 2:

  • Bias Mitigation:

    • Addressed bias in race stereotypes within text data by generating synthetic data to counter sampling bias.
    • Introduced regularization techniques to enhance model generalization and fairness.
  • Privacy and Sustainability:

    • Incorporated considerations for privacy to protect sensitive data during model training and deployment.
    • Focused on environmental sustainability by adopting resource-efficient methods in their implementation.

This solution underscores the importance of fairness, privacy, and environmental responsibility in modern AI systems.


Team Members:


Get Started

  • Access the hackathon tutorials:
    Explore the tutorials to learn about the hackathon tasks and get started: Hackathon Tutorials

  • Join the hackathon’s communication channels:
    Get live support and collaborate with other participants on Slack:
    Holistic AI Community Slack


Hackathon Tracks

Track 1: Multi-Objective Optimization for AI Trustworthiness in Tabular Data

Participants aim to develop AI systems that optimize multiple objectives simultaneously, including performance, fairness, robustness, privacy, security, explainability, and sustainability, using tabular datasets. The goal is to create trustworthy AI models that are reliable and equitable across diverse applications.

Track 2: Building Trustworthy Models for Stereotype Classification in Text Data

This track focuses on designing ethical AI systems to detect and mitigate stereotypes in text data. Using the Expanded Multi-Grain Stereotype Dataset (EMGSD), participants build models that classify text into stereotype-related, neutral, or unrelated categories while addressing fairness, privacy, and interpretability challenges.


Sponsor Highlight: Holistic AI

Holistic AI is the industry leader in trustworthy AI solutions, empowering organizations to build, deploy, and scale AI systems with confidence. Learn more at Holistic AI Website.


Disclaimer

This repository is for educational and research purposes only. Submissions remain the intellectual property of their respective teams. Holistic AI and UCL AI Society are not liable for any misuse of the resources. For inquiries, contact hackathon@holisticai.com.


Acknowledgments

This event is brought to you by:

  • Holistic AI
  • UCL AI Society
  • Our Amazing Participants

Together, we’re pushing the boundaries of responsible AI.


Connect with Us

Follow us for updates and future events:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published