Skip to content
@codeverbs

CodeVerb

Welcome to CodeVerb

CodeVerb took the initiative to revolutionize the development like never before. CodeVerb generates Python Language Code from English Language Text.

new-logo-transparent

CodeVerb Architecture

There are three repositories, each with their own purpose.

  1. transformer-pytorch: Contains the code for transformer based model. The model is developed using PyTorch.

  2. web-portal: Contains the code for Frontend Portal. The portal is made using ReactJS, TailwindCSS.

  3. model-api: Contains the code for Backend Server of the portal. The server is made using Flask.

Dataset Collection

Scraped Dataset from GitHub Public Repositories, StackExchange, GeeksForGeeks

100 of Millions of Python Code Lines

Approx. 7.2 million files of Python are scraped

Assuming 5 files take 1 second to scrape 7.2million/432000 secs ~ 15 Days

Parallel Processing helped us scrape this dataset in just ~ 7 Days

State Diagram

image

Project Workflow

image

Iteration 01

Data Scraping & Collection Flow

We are scraping our dataset from StackExchange (StackOverflow, CodeReview, etc.) and GitHub. To achieve this, we have made our custom scrappers from scratch for the both platforms. For GitHub, as the dataset was massive, we had to use a cluster setup to perform multithreading to execute the processes in parallel in order to achieve faster and efficient execution. We stored the dataset in separate files as it is more convenient to transfer over the network as well as load while training or viewing the dataset.

image

Iteration 02

CodeVerb uses state of the art deep learning model to achieve its target of code generation from natural language input. In 2017, research named “Attention is all you Need” was published which helped pave the way for the advent of large language models making breakthroughs in the field of Natural Language Processing (NLP). Our system was designed using the idea behind that research as its basis.

Model Architecture

CodeVerb uses Transformer based model to achieve its goal. The Encode-Decoder model serves the purpose really well according the use case.

image

Web Portal

Landing Page

image

Playground Page

image

Iteration 03

Set up Distributed Training Environment

Neural Network Based Large Language Model Training [Currently Ongoing]

Training Environment

  1. Implemented PyTorch Data Distributed Pipeline
  2. Used Nvidia Nickel (NCCL) backend to communicate with distributed machine setup
  3. Total Machines = 3
  4. Total GPUs = 3 (Nvidia 3060)

Training Time

EPOCHS: 50000

Single Epoch Training Time: ~ 0.7 secs

Total Training Time: 0.7*EPOCHS = ~ 24 days

Current Model Epochs: ~ 5000

Training Time: 0.7*5000 = ~ 3 Days

Popular repositories Loading

  1. transformer-pytorch transformer-pytorch Public

    This is implementation of Original Transformer in PyTorch. The implementation is based on the research paper "Attention is All You Need".

    Python

  2. .github .github Public

  3. codeverb-vscode-extension codeverb-vscode-extension Public

    VS Code extension for CodeVerb

    TypeScript

  4. codeverb-tlm-0.7b-api codeverb-tlm-0.7b-api Public

    Python

  5. codeverb-tlm-0.1b-api codeverb-tlm-0.1b-api Public

    Python

Repositories

Showing 5 of 5 repositories
  • codeverbs/codeverb-tlm-0.7b-api’s past year of commit activity
    Python 0 0 0 0 Updated Jun 13, 2023
  • codeverbs/codeverb-tlm-0.1b-api’s past year of commit activity
    Python 0 0 0 0 Updated Jun 11, 2023
  • codeverb-vscode-extension Public

    VS Code extension for CodeVerb

    codeverbs/codeverb-vscode-extension’s past year of commit activity
    TypeScript 0 0 0 0 Updated May 15, 2023
  • .github Public
    codeverbs/.github’s past year of commit activity
    0 0 0 0 Updated Jan 8, 2023
  • transformer-pytorch Public

    This is implementation of Original Transformer in PyTorch. The implementation is based on the research paper "Attention is All You Need".

    codeverbs/transformer-pytorch’s past year of commit activity
    Python 0 0 0 0 Updated Dec 30, 2022

Top languages

Loading…

Most used topics

Loading…