Hackathon Website: Holistic AI UCL Hackathon
Welcome to the Holistic AI x UCL AI Society Hackathon 2024! This repository contains all the code, datasets, and resources needed to participate. Whether you're a seasoned data scientist or a beginner, we've got you covered.
This section highlights the outstanding projects from the top 5 teams in the hackathon. Each project addresses key challenges in AI trustworthiness and innovation.
GitHub Repository: HERMES AI Implementation
Presentation Slides: HERMES AI Slides
Track: Track 2
Language models (LLMs), trained on human-produced data, often inherit and propagate harmful stereotypes. HERMES AI uses Mechanistic Interpretability to mitigate such issues by training a Sparse Autoencoder (SAE) on the Expanded Multi-Grain Stereotype Dataset (EMGSD). The project demonstrates effective manipulation of GPT-2’s text generation, reducing harmful stereotypes and achieving ethical AI outputs.
Team Members:
- Seonglae Cho: sungle3737@gmail.com
- Gustavo Costa: gustavo@arcos.org.br
- Linnea Loserius: loserius127@gmail.com
- Yung Hsaun Wu: wu8815@gmail.com
- Andrea Lo: lpsandrea02@gmail.com
GitHub Repository: 2 Chill Guys Implementation
Presentation Slides: 2 Chill Guys Slides
Tracks: Track 1 and Track 2
2 Chill Guys tackled challenges in both tracks with innovative solutions:
Track 1:
- Utilized Optuna to design and optimize custom pipelines for datasets in the HolisticAI library.
- Focused on balancing performance and fairness to align with trustworthy AI principles.
Track 2:
- Implemented DistilBERT combined with sparse autoencoders to investigate stereotype encoding in language models.
- Prioritized interpretability and bias analysis, providing insights into stereotype representation and effective mitigation strategies.
Team Members:
- Nichita Mitrea: andsad52@gmail.com
- Anna Aghajanyan: aghajanyananna03@gmail.com
GitHub Repository: ARM Implementation
Presentation Slides: ARM Slides
Tracks: Track 1 and Track 2
ARM developed innovative methods for both tracks by focusing on fairness, accuracy, and sustainability in machine learning models:
Track 1:
-
Custom Training and Pruning for Tree-Based Models:
- Designed a custom objective function using HolisticAI's metrics library.
- Incorporated fairness metrics such as disparate impact and equal opportunity into the training process.
-
Bayesian Optimization:
- Leveraged Optuna for hyperparameter search to optimize multiple objectives, including accuracy and fairness.
-
Pre-Pruning with Fairness Penalty:
- Introduced two custom tree-pruning methods incorporating a fairness penalty.
- Reduced systemic gender bias while maintaining accuracy and promoting explainability.
- Enhanced carbon efficiency by training a single model with fairness included from the start, avoiding multiple training runs.
This approach demonstrates how FairML techniques can effectively ensure equity, sustainability, and performance in modern AI systems.
Team Members:
- Rishi Kalra: rishi.kalra.20@ucl.ac.uk
- Avanija Menon: avanijamenon@gmail.com
- Murtaza Dhanerawala: murtazadhan02@gmail.com
GitHub Repository: Stery Class Implementation
Presentation Slides: Stery Class Slides
Tracks: Track 2
Stery Class focused on developing a sustainable, high-performance classification pipeline with a two-stage approach for stereotype classification:
Track 2:
-
Two-Stage Classification Pipeline:
- Stage 1: Utilized a LightGBM classifier with TF-IDF vectorization and innovative features like cosine similarity between input text and mean sentence embeddings for stereotype classes.
- Stage 2: Predicted category (stereotype-related or unrelated) by augmenting Stage 1 features with class probabilities from the first model’s forward pass.
-
Focus on Sustainability:
- Prioritized lightweight and efficient inference, ensuring reduced environmental impact without compromising accuracy.
-
Inspiration from Generative Models:
- Proposed a generative model framework using Llama 3 8B with LoRA for coarse-to-fine stereotype classification and reasoning. While computational constraints prevented real-time implementation, the pipeline design drew inspiration from this generative model structure.
This efficient yet innovative approach highlights the potential for combining lightweight ML models with ideas from generative AI to achieve interpretability, efficiency, and sustainability.
Team Members:
- Alexander Stern: alexander.stern.20@ucl.ac.uk
- Jose Caceres: jose.valenzuela.24@ucl.ac.uk
GitHub Repository: Holistic Indians Implementation
Presentation Slides: Holistic Indians Slides
Tracks: Track 1 and Track 2
Holistic Indians approached both tracks with a focus on mitigating biases, ensuring privacy, and prioritizing sustainability in stereotype classification tasks:
Track 2:
-
Bias Mitigation:
- Addressed bias in race stereotypes within text data by generating synthetic data to counter sampling bias.
- Introduced regularization techniques to enhance model generalization and fairness.
-
Privacy and Sustainability:
- Incorporated considerations for privacy to protect sensitive data during model training and deployment.
- Focused on environmental sustainability by adopting resource-efficient methods in their implementation.
This solution underscores the importance of fairness, privacy, and environmental responsibility in modern AI systems.
Team Members:
- Vignesh Balaji: guruvignesh2001@gmail.com
- Vishal Sharma: Vishal2510sharma@gmail.com
- Sameeh Razak: sameehrazak366@gmail.com
- Sayak Mukherjee: sayak.mukherjee20@imperial.ac.uk
- Arhaan Mohammed Shaikh: arhaan18june@gmail.com
-
Access the hackathon tutorials:
Explore the tutorials to learn about the hackathon tasks and get started: Hackathon Tutorials -
Join the hackathon’s communication channels:
Get live support and collaborate with other participants on Slack:
Holistic AI Community Slack
Participants aim to develop AI systems that optimize multiple objectives simultaneously, including performance, fairness, robustness, privacy, security, explainability, and sustainability, using tabular datasets. The goal is to create trustworthy AI models that are reliable and equitable across diverse applications.
This track focuses on designing ethical AI systems to detect and mitigate stereotypes in text data. Using the Expanded Multi-Grain Stereotype Dataset (EMGSD), participants build models that classify text into stereotype-related, neutral, or unrelated categories while addressing fairness, privacy, and interpretability challenges.
Holistic AI is the industry leader in trustworthy AI solutions, empowering organizations to build, deploy, and scale AI systems with confidence. Learn more at Holistic AI Website.
This repository is for educational and research purposes only. Submissions remain the intellectual property of their respective teams. Holistic AI and UCL AI Society are not liable for any misuse of the resources. For inquiries, contact hackathon@holisticai.com.
This event is brought to you by:
- Holistic AI
- UCL AI Society
- Our Amazing Participants
Together, we’re pushing the boundaries of responsible AI.
Follow us for updates and future events: