Skip to content

πŸš€ A fully custom Transformer-based model built in PyTorch to convert C++ code into human-readable pseudocode. Trained on Stanford’s SPoC dataset, it features a clean Gradio UI and zero pre-trained dependencies β€” pure sequence-to-sequence intelligence. 🧠

Notifications You must be signed in to change notification settings

asadsandhu/Code2Pseudo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸ”„ Code2Pseudo – Transformer-based C++ to Pseudocode Converter

License: MIT Python Hugging Face GitHub Repo

A fully custom Transformer-based Sequence-to-Sequence model built from scratch in PyTorch to convert executable C++ code into high-level pseudocode. Trained on the SPoC dataset from Stanford.


πŸ–ΌοΈ Demo

Try it live on Hugging Face Spaces:
πŸ‘‰ https://huggingface.co/spaces/asadsandhu/Code2Pseudo

App Demo


🧠 Model Architecture

  • Built from scratch using the Transformer encoder-decoder architecture (PyTorch)
  • No pre-trained libraries – 100% custom code
  • Token-level sequence generation with greedy decoding
  • Custom tokenization and vocabulary building for both C++ and pseudocode

Input:   C++ lines (line-by-line)
Model:   Transformer (Encoder-Decoder)
Output:  Corresponding pseudocode line


πŸ“Š Dataset

We trained on the SPoC dataset:

  • βœ… Cleanly aligned C++ ↔ pseudocode line pairs
  • βœ… High-quality syntactic coverage
  • βœ… Multiple test splits available
  • βœ… Custom preprocessing and token handling

πŸ“Ž Licensed under CC BY 4.0


πŸ“ Directory Structure


.
β”œβ”€β”€ app.py                # Gradio web app (C++ β†’ Pseudocode)
β”œβ”€β”€ train.py              # Training script for code-to-pseudocode model
β”œβ”€β”€ model.pth             # Trained model and vocab checkpoint
β”œβ”€β”€ spoc/
β”‚   └── train/
β”‚       β”œβ”€β”€ spoc-train.tsv
β”‚       └── split/spoc-train-eval.tsv
β”œβ”€β”€ assets/
β”‚   └── demo.png          # Screenshot for README
└── README.md             # This file


πŸ› οΈ How to Run Locally

βš™οΈ 1. Clone the Repo

git clone https://github.com/asadsandhu/Code2Pseudo.git
cd Code2Pseudo
pip install torch gradio tqdm

πŸš€ 2. Launch the Web App

Make sure model.pth exists (or train it first):

python app.py

The interface will open in your browser.


πŸ§ͺ Training the Model

To retrain the transformer model:

python train.py

By default:

  • Downloads SPoC dataset from GitHub
  • Trains for 10 epochs
  • Produces model.pth with weights and vocabulary

πŸ”§ Key Hyperparameters

Parameter Value
Model Type Transformer
Max Length 128
Embedding Dim 256
FFN Dim 512
Heads 4
Encoder Layers 2
Decoder Layers 2
Batch Size 64
Epochs 10
Optimizer Adam
Learning Rate 1e-4

🧩 Example Input

int main() {
int n , nn , ans = 0 ;
cin > > n ;
for ( int i = 2 ; i < = n - 1 ; i + + ) {
nn = n ;
while ( nn = = 0 ) ans + = nn % i , nn / = i ;
}
o = gcd ( ans , n - 2 ) ;
cout < < ans / 2 / o ( n - 2 ) / o < < endl ;
return 0;
}

⏩ Output Pseudocode

create integers n , nn , ans with ans = 0
read n
for i = 2 to n - 1 inclusive
set nn to n
while nn is 0 , set ans to nn % 12 , set ans to nn % nn , set nn to nn / i
set value of gcd to ans and n - 2
print ans / 2 / ( n - 2 ) / o

πŸ“¦ Deployment

Live demo hosted on:


πŸ™Œ Acknowledgements


πŸ§‘β€πŸ’» Author

Asad Ali GitHub: asadsandhu Hugging Face: asadsandhu LinkedIn: asadxali


πŸ“„ License

This project is licensed under the MIT License. Use, remix, and distribute freely with attribution.

About

πŸš€ A fully custom Transformer-based model built in PyTorch to convert C++ code into human-readable pseudocode. Trained on Stanford’s SPoC dataset, it features a clean Gradio UI and zero pre-trained dependencies β€” pure sequence-to-sequence intelligence. 🧠

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published