GitHub - ChristinaSchlager/flashcard_generator: Explore "AI Meets Classroom: Optimizing Transformer-Based Language Models for Education"—a proof-of-concept project using RAG models, emphasizing FATE principles. Discover robust evaluation methods and help advance the future of learning!

Overview

This repository contains the source code for the master's thesis "AI Meets Classroom: Optimizing Transformer-Based Language Models for Education". The project evaluates Retrieval Augmented Generation (RAG) pipelines, emphasizing the integration of FATE (Fairness, Accountability, Transparency, and Ethics) principles. A comprehensive suite of evaluation metrics, including those from Ragas, DeepEval, and Haystack, to ensure a robust analysis of the performance of RAG models in educational settings is used.

RAG Pipeline

Pipelines	Description
Flashcard Generator	An open-source pipeline designed by valuing the principles of the FATE framework.
Baseline	Utilizes OpenAI's models to serve as a comparative baseline in the evaluations process.

Prerequisites

Before running the code, please ensure the following requirements are met:

Python 3.11.9 installed
API keys for Together.ai, OpenAI and Nomic Atlas set as environment variables
All dependencies installed from requirements.txt

API Integration Details

This project integrates several APIs to enhance functionality and achieve comprehensive evaluation metrics. Below is a detailed description of each API used.

Together AI

Purpose: Facilitates collaborative AI-driven applications and integrations, providing tools for real-time model training and inference across various frameworks while supporting only open-source models.
Setup: Register for an account on Together AI's platform and configure your project with an API key.
Documentation: Together AI API documentation

OpenAI

Purpose: Supports advanced text generation and document retrieval functionalities essential for the RAG baseline pipeline for comparison.
Setup: Obtain an API key from OpenAI and set it as an environment variable as described in the Installation section.
Documentation: OpenAI API documentation

Nomic Atlas

Purpose: Provides open-source language model embeddings and clustering algorithms to enhance semantic search and data organization capabilities.
Setup: Install the Nomic Atlas Python package, configure your API key, and follow the initialization guide to start embedding your data.
Documentation: Nomic Atlas API documentation

Installation

Clone the repository and install the required packages:

git clone https://github.com/ChristinaSchlager/flashcard_generator.git
cd flashcard_generator
pip install -r requirements.txt

Documentation

The documentation for this project is available in PDF format. Please find the details below:

PDF Documentation

Abstract

In the modern educational landscape, digitization has led to a significant increase in textual data from multiple sources. While this increase in data offers a wealth of knowledge, it also presents substantial challenges for educators and students. Moreover, schools have transformed into dynamic, interactive learning environments where the use of Large Language Models (LLMs) has become indispensable, particularly since the release of ChatGPT in 2022. However, the introduction of regulatory frameworks such as the EU AI Act highlights the importance of both functional and non-functional requirements for developing fair, transparent, responsible and ethical AI. Therefore, the implementation of the FATE (Fairness, Accountability, Transparency, and Ethics) framework in the development of a Retrieval Augmented Generation (RAG) model is demonstrated through its application to a Flashcard Generator designed to create educational flashcards. It is shown that the integration of ethical principles not only aligns with, but also enhances the performance of the model. The Flashcard Generator, developed using publicly available data and open-source models, leverages the RAG model to generate question-answer pairs for flashcards. It is evaluated against human-generated ground truth answers and a benchmark RAG model built with GPT-4 Turbo employing evaluation metrics from Ragas, DeepEval, and Haystack. This proof-of-concept serves as a template for future advancements, emphasizing fairness, transparency, and ethics, while maintaining high performance.

Citation

If you use this project provided within in your research, please cite it as follows:

Schlager, C., "AI Meets Classroom: Optimizing Transformer-Based Language Models for Education." Master's Thesis, Data Science & Intelligent Analytics, Kufstein University of Applied Sciences, Austria, 2024.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data_baseline		data_baseline
data_flashcard_generator		data_flashcard_generator
docs		docs
documentation		documentation
img		img
tools		tools
.gitignore		.gitignore
Baseline_pipeline.ipynb		Baseline_pipeline.ipynb
Flashcard_Generator_pipeline.ipynb		Flashcard_Generator_pipeline.ipynb
LICENSE.txt		LICENSE.txt
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Table of Contents

Overview

RAG Pipeline

Prerequisites

API Integration Details

Together AI

OpenAI

Nomic Atlas

Installation

Documentation

Abstract

Citation

About

Releases

Packages

Languages

License

ChristinaSchlager/flashcard_generator

Folders and files

Latest commit

History

Repository files navigation

Table of Contents

Overview

RAG Pipeline

Prerequisites

API Integration Details

Together AI

OpenAI

Nomic Atlas

Installation

Documentation

Abstract

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages