Skip to content

EAzari/tlm

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Tiny Language Model (TLM)

  1. Is it possible use Abstract Syntax Tree (AST) (AST-like) besides using regex in the Byte Pair Encoding (BPE) algorithm?

  2. Can develop a tiny language model (TLM), character-based?

Based on this method that has some similarities with n-grams?

The quick brown fox jumps over the lazy dog.

⬇️ ⬆️

the, quick, brown, fox, jumps, over, the, lazy, dog --> word 9-gram or 9-wgram

⬇️ ⬆️

t, h, e = t + h + e = the --> char 3-gram or word 1-gram (3-cgram or 1-wgram)

q, u, i, c, k = q + u + i + c + k = quick --> char 5-gram or word 1-gram (5-cgram or 1-wgram)

...

⬇️ ⬆️

a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z --> char 1-gram or 1-cgram

About

Tiny Language Model (TLM)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published