Skip to content

This repository dedicated to implementing and exploring the inner workings of LLMs by building them from the ground up using basic programming and mathematical concepts. The goal is to delve deeper into the mechanics of LLMs

Notifications You must be signed in to change notification settings

trilokpadhi/LLM-from-Scratch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 

Repository files navigation

This repository is a motivation to implement LLMs from Scratch using PyTorch. The repository is inspired by the Hugging Face Transformers, The Annotated Transformer, and The Illustrated Transformer repositories. The goal is to understand the architecture and the implementation details of the LLMs. The repository is a work in progress and will be updated regularly.

Table of Contents

Introduction

The LLMs are a type of neural network that is trained to predict the next word in a sentence given the previous words. The LLMs are used in various NLP tasks such as text generation, machine translation, and sentiment analysis. The LLMs are based on the Transformer architecture, which is a type of neural network that uses self-attention mechanism to process the input sequence. It is composed of an encoder and a decoder, which are used to process the input sequence and generate the output sequence, respectively. The LLMs are trained using a large corpus of text data, which is used to learn the patterns in the text data and generate the output sequence.

Architecture

The LLMs are based on the Transformer architecture, which is composed of an encoder and a decoder, which are used to process the input sequence and generate the output sequence, respectively. The encoder is used to process the input sequence and generate a representation of the input sequence, which is used by the decoder to generate the output sequence. The encoder and decoder are composed of multiple layers of self-attention mechanism, which is used to process the input sequence and generate the output sequence. The self-attention mechanism is used to compute the attention weights between the input sequence and the output sequence, which are used to generate the output sequence.

Implementation

The implementation of the LLMs is done using PyTorch, which is a popular deep learning library in Python. The implementation is based on the Transformer architecture, which is a type of neural network that uses self-attention mechanism to process the input sequence. The implementation is done in a modular way, which allows for easy customization of the architecture and the training process. The implementation is inspired by the The Annotated Transformer, and The Illustrated Transformer, which provide a detailed explanation of the Transformer architecture and its implementation details.

Usage

The repository contains the implementation of the LLMs in PyTorch. The implementation is done in a modular way, which allows for easy customization of the architecture and the training process. The repository contains the following files:

  • transformer.py: Contains the implementation of the Transformer architecture.
  • train.py: Contains the training script for the LLMs.
  • generate.py: Contains the generation script for the LLMs.

To train the LLMs, run the following command:

python train.py

To generate text using the trained LLMs, run the following command:

python generate.py

References

Blogs

Papers

Repositories

Courses

About

This repository dedicated to implementing and exploring the inner workings of LLMs by building them from the ground up using basic programming and mathematical concepts. The goal is to delve deeper into the mechanics of LLMs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published