Morphologically biased byte-pair encoding
-
Updated
Nov 11, 2024 - Rust
Morphologically biased byte-pair encoding
Transformer implementation in pytorch trained on NVIDIA A100 in fp16
Modern Eager TensorFlow implementation of Attention Is All You Need
A Visualizer to check how BPE Tokenizer in an LLM Works
A byte-level Byte Pair Encoding (BPE) algorithm for tokenization in Large Language Models (LLMs), similar to those used in GPT, Llama, and Mistral.
Fast BPE algorithm to generate byte pair encodings from text corpus, it's written in rust and approximately 20x faster than it's python implementation
This is project for sequence to sequence NLP task. We developed a custom model to understand the process of task using PyTorch. We also fine tuned pre-trained transformer models to improve the performance of translation task.
An implementation of the GPT(generative pretrained transformer) model, from scratch, which produces Shakespearean text by training on the dialogues written by Shakespeare along with the GPT Encoder.
Code repo for the paper "AutoGO: Automated Computation Graph Optimization for Neural Network Evolution", accepted to NeurIPS 2023.
Order-agnostic lossless compressor using BPE and Huffman Coding.
Byte-Pair Encoding tokenizer for training large language models on huge datasets
Byte-level byte pair encoding (BPE) in Haskell
This repository houses my assignments completed during the Deep Learning course as part of my Master's in Data Analytics program. Explore diverse projects showcasing hands-on applications of advanced neural networks and machine learning techniques.
Byte pair encoding tokenizer as used in some large language models.
Byte-pair encoding implementation in Python.
High performance unsupervised text tokenization for Ruby
an efficient ranked retrieval system for English corpora, optimised with VBE and BPE.
An Introduction to Natural Language Processing (NLP)
Add a description, image, and links to the byte-pair-encoding topic page so that developers can more easily learn about it.
To associate your repository with the byte-pair-encoding topic, visit your repo's landing page and select "manage topics."