🧠 Sudo Code Program NLP

This repository is a collection of step-by-step programs and notebooks for learning and experimenting with Natural Language Processing (NLP), with a special focus on Vietnamese text.

It is being developed during my participation in the Sudo Code Program. This repository is a collection of step-by-step programs and notebooks for learning and experimenting with Natural Language Processing (NLP), with a special focus on Vietnamese text.

📚 Table of Contents

1. Text Preprocessing

Unicode & diacritic normalization
Remove HTML tags, URLs, emails, numbers, emojis
Vietnamese tokenization
Smart lowercase handling with POS tagging
and more (will update after) 👉 Notebook

2. Text Representation

N-gram representation (bi-gram, tri-gram)
Bag-of-Words (BoW) vectorization
TF-IDF weighting for word importance
👉 Notebook

3. Word2Vec Model

Train Skip-gram & CBOW models using Gensim
Generate 300-dimensional Vietnamese word embeddings
Evaluate with similarity & analogy tests
Visualize embeddings via PCA 2D plot
👉 Notebook

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
01_text_preprocessing		01_text_preprocessing
02_text_representation		02_text_representation
03_word2vec		03_word2vec
04_lstm_generation		04_lstm_generation
05_attention_text_summarization		05_attention_text_summarization
06_machine_translation		06_machine_translation
07_llms_and_variants		07_llms_and_variants
08_rag_fastapi		08_rag_fastapi
09_web_scraping		09_web_scraping
10_prompting_guide		10_prompting_guide
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Sudo Code Program NLP

📚 Table of Contents

1. Text Preprocessing

2. Text Representation

3. Word2Vec Model

About

Uh oh!

Releases

Packages

Languages

hanguyenai/sudo-code-nlp

Folders and files

Latest commit

History

Repository files navigation

🧠 Sudo Code Program NLP

📚 Table of Contents

1. Text Preprocessing

2. Text Representation

3. Word2Vec Model

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages