Skip to content

Explore "AI Meets Classroom: Optimizing Transformer-Based Language Models for Education"—a proof-of-concept project using RAG models, emphasizing FATE principles. Discover robust evaluation methods and help advance the future of learning!

License

Notifications You must be signed in to change notification settings

ChristinaSchlager/flashcard_generator

Repository files navigation

Github Repo Made with Python 3.11.9 Evaluated by Ragas Evaluated by DeepEval Evaluated by Haystack Nomic Embeddings Llama 3.1 Turbo Documentation with Sphinx

Table of Contents

Overview

This repository contains the source code for the master's thesis "AI Meets Classroom: Optimizing Transformer-Based Language Models for Education". The project evaluates Retrieval Augmented Generation (RAG) pipelines, emphasizing the integration of FATE (Fairness, Accountability, Transparency, and Ethics) principles. A comprehensive suite of evaluation metrics, including those from Ragas, DeepEval, and Haystack, to ensure a robust analysis of the performance of RAG models in educational settings is used.

RAG Pipeline

Pipelines Description
Flashcard Generator An open-source pipeline designed by valuing the principles of the FATE framework.
Baseline Utilizes OpenAI's models to serve as a comparative baseline in the evaluations process.

Prerequisites

Before running the code, please ensure the following requirements are met:

  • Python 3.11.9 installed
  • API keys for Together.ai, OpenAI and Nomic Atlas set as environment variables
  • All dependencies installed from requirements.txt

API Integration Details

This project integrates several APIs to enhance functionality and achieve comprehensive evaluation metrics. Below is a detailed description of each API used.

Together AI

  • Purpose: Facilitates collaborative AI-driven applications and integrations, providing tools for real-time model training and inference across various frameworks while supporting only open-source models.
  • Setup: Register for an account on Together AI's platform and configure your project with an API key.
  • Documentation: Together AI API documentation

OpenAI

  • Purpose: Supports advanced text generation and document retrieval functionalities essential for the RAG baseline pipeline for comparison.
  • Setup: Obtain an API key from OpenAI and set it as an environment variable as described in the Installation section.
  • Documentation: OpenAI API documentation

Nomic Atlas

  • Purpose: Provides open-source language model embeddings and clustering algorithms to enhance semantic search and data organization capabilities.
  • Setup: Install the Nomic Atlas Python package, configure your API key, and follow the initialization guide to start embedding your data.
  • Documentation: Nomic Atlas API documentation

Installation

Clone the repository and install the required packages:

git clone https://github.com/ChristinaSchlager/flashcard_generator.git
cd flashcard_generator
pip install -r requirements.txt

Documentation

The documentation for this project is available in PDF format. Please find the details below:

Abstract

In the modern educational landscape, digitization has led to a significant increase in textual data from multiple sources. While this increase in data offers a wealth of knowledge, it also presents substantial challenges for educators and students. Moreover, schools have transformed into dynamic, interactive learning environments where the use of Large Language Models (LLMs) has become indispensable, particularly since the release of ChatGPT in 2022. However, the introduction of regulatory frameworks such as the EU AI Act highlights the importance of both functional and non-functional requirements for developing fair, transparent, responsible and ethical AI. Therefore, the implementation of the FATE (Fairness, Accountability, Transparency, and Ethics) framework in the development of a Retrieval Augmented Generation (RAG) model is demonstrated through its application to a Flashcard Generator designed to create educational flashcards. It is shown that the integration of ethical principles not only aligns with, but also enhances the performance of the model. The Flashcard Generator, developed using publicly available data and open-source models, leverages the RAG model to generate question-answer pairs for flashcards. It is evaluated against human-generated ground truth answers and a benchmark RAG model built with GPT-4 Turbo employing evaluation metrics from Ragas, DeepEval, and Haystack. This proof-of-concept serves as a template for future advancements, emphasizing fairness, transparency, and ethics, while maintaining high performance.

Citation

If you use this project provided within in your research, please cite it as follows:

Schlager, C., "AI Meets Classroom: Optimizing Transformer-Based Language Models for Education." Master's Thesis, Data Science & Intelligent Analytics, Kufstein University of Applied Sciences, Austria, 2024.

About

Explore "AI Meets Classroom: Optimizing Transformer-Based Language Models for Education"—a proof-of-concept project using RAG models, emphasizing FATE principles. Discover robust evaluation methods and help advance the future of learning!

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published