🔄 Code2Pseudo – Transformer-based C++ to Pseudocode Converter

A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert executable C++ code into high-level pseudocode. Trained on the SPoC dataset from Stanford.

🖼️ Demo

Try it live on Hugging Face Spaces:
👉 https://huggingface.co/spaces/asadsandhu/Code2Pseudo

🧠 Model Architecture

Built from scratch using the Transformer encoder-decoder architecture (PyTorch)
No pre-trained libraries – 100% custom code
Token-level sequence generation with greedy decoding
Custom tokenization and vocabulary building for both C++ and pseudocode


Input:   C++ lines (line-by-line)
Model:   Transformer (Encoder-Decoder)
Output:  Corresponding pseudocode line

📊 Dataset

We trained on the SPoC dataset:

✅ Cleanly aligned C++ ↔ pseudocode line pairs
✅ High-quality syntactic coverage
✅ Multiple test splits available
✅ Custom preprocessing and token handling

📎 Licensed under CC BY 4.0

📁 Directory Structure


.
├── app.py                # Gradio web app (C++ → Pseudocode)
├── train.py              # Training script for code-to-pseudocode model
├── model.pth             # Trained model and vocab checkpoint
├── spoc/
│   └── train/
│       ├── spoc-train.tsv
│       └── split/spoc-train-eval.tsv
├── assets/
│   └── demo.png          # Screenshot for README
└── README.md             # This file

🛠️ How to Run Locally

⚙️ 1. Clone the Repo

git clone https://github.com/asadsandhu/Code2Pseudo.git
cd Code2Pseudo
pip install torch gradio tqdm

🚀 2. Launch the Web App

Make sure model.pth exists (or train it first):

python app.py

The interface will open in your browser.

🧪 Training the Model

To retrain the transformer model:

python train.py

By default:

Downloads SPoC dataset from GitHub
Trains for 10 epochs
Produces model.pth with weights and vocabulary

🔧 Key Hyperparameters

Parameter	Value
Model Type	Transformer
Max Length	128
Embedding Dim	256
FFN Dim	512
Heads	4
Encoder Layers	2
Decoder Layers	2
Batch Size	64
Epochs	10
Optimizer	Adam
Learning Rate	1e-4

🧩 Example Input

int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}

⏩ Output Pseudocode

create integers n , nn , ans with ans = 0
read n
for i = 2 to n - 1 inclusive
set nn to n
while nn is 0 , set ans to nn % 12 , set ans to nn % nn , set nn to nn / i
set value of gcd to ans and n - 2
print ans / 2 / ( n - 2 ) / o

📦 Deployment

Live demo hosted on:

Hugging Face Spaces: Code2Pseudo
GitHub: github.com/asadsandhu/Code2Pseudo

🙌 Acknowledgements

📘 SPoC Dataset by Stanford University Kulal, S., Pasupat, P., & Liang, P. (2020). SPoC: Search-based Pseudocode to Code
🧠 Transformer Paper: "Attention is All You Need"

🧑‍💻 Author

Asad Ali GitHub: asadsandhu Hugging Face: asadsandhu LinkedIn: asadxali

📄 License

This project is licensed under the MIT License. Use, remix, and distribute freely with attribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🔄 Code2Pseudo – Transformer-based C++ to Pseudocode Converter

🖼️ Demo

🧠 Model Architecture

📊 Dataset

📁 Directory Structure

🛠️ How to Run Locally

⚙️ 1. Clone the Repo

🚀 2. Launch the Web App

🧪 Training the Model

🔧 Key Hyperparameters

🧩 Example Input

⏩ Output Pseudocode

📦 Deployment

🙌 Acknowledgements

🧑‍💻 Author

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
spoc		spoc
README.md		README.md
app.py		app.py
model.pth		model.pth
requirements.txt		requirements.txt
train.ipynb		train.ipynb

asadsandhu/Code2Pseudo

Folders and files

Latest commit

History

Repository files navigation

🔄 Code2Pseudo – Transformer-based C++ to Pseudocode Converter

🖼️ Demo

🧠 Model Architecture

📊 Dataset

📁 Directory Structure

🛠️ How to Run Locally

⚙️ 1. Clone the Repo

🚀 2. Launch the Web App

🧪 Training the Model

🔧 Key Hyperparameters

🧩 Example Input

⏩ Output Pseudocode

📦 Deployment

🙌 Acknowledgements

🧑‍💻 Author

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages