Character-Level Transformer (from scratch)

A character-level language model built from scratch, inspired by Andrej Karpathy’s notes on neural networks and the paper "Attention is all you need". The goal is to deepen my understanding of language models and deep-learning internals.

Results

Parameters: 6 298 972
Dataset: 6 M characters (90 % train / 10 % validation)
Best metrics: Validation loss 0.6895, validation accuracy 78.52 %

Loss curves for both training and validation are included directly in the repo root (see the PNG files).

Tokenizer Experiments

I tested a custom tokenizer, but results didn’t improve, so the model remains character-level.

Dataset

Source: roneneldan/TinyStories (Hugging Face)
License: CDLA-Sharing 1.0 (dataset only)
Note: Raw dataset files are not committed to this repository

Quick note

Training ran on Google Colab GPUs. The final code was copied into the notebook in this repo, so there may be minor path issues or typos.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
README.md		README.md
generated_text_example.txt		generated_text_example.txt
model_dev.ipynb		model_dev.ipynb
training_loss.png		training_loss.png
val_loss.png		val_loss.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Character-Level Transformer (from scratch)

Results

Tokenizer Experiments

Dataset

Quick note

About

Uh oh!

Releases

Packages

Languages

INH39/Language-Model-from-scratch

Folders and files

Latest commit

History

Repository files navigation

Character-Level Transformer (from scratch)

Results

Tokenizer Experiments

Dataset

Quick note

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages