Source Language Processing

Library that explores source code to model it and make predictions using N-gram method.

Original idea and library lays here

Getting Started

Setting up the dependency

You can do it in several ways

Gradle

We will be using JitPack service

Add JitPack to repositories

repositories {
    maven { url "https://jitpack.io" }
}

Add library to compile dependencies

dependencies {
    implementation "com.github.AndreyBychkov:SLP:x"
}

and replace x with latest version number.

Manually

Download jar and source code from latest release and use your favorite way to add it as a dependency.

First Model

val file = File("path/to/file.ext")

val manager = ModelRunnerManager()
val modelRunner = manager.getModelRunner(file.extension)

modelRunner.train(file)

val suggestion = modelRunner.getSuggestion("your source code")
println(suggestion)

Here we train a model on specified file and make a suggestion of next token for inputs like for ( or System.out.

Pivotal classes

ModelRunnerManager

ModelRunnerManager is a class that

provides you Models for specified file's extension or Language.
Contains and manages all your models.
Provides IO operations for save & load itself and thus containing models

Example

val storeDirectory = File("path/to/dir")

ModelRunnerManager().apply { 
    load(storeDirectory)
    getModelRunner(Language.JAVA).train("int i = 0;")
    save(storeDirectory)
}

ModelRunner

ModelRunner and LocalGlobalModelRunner are classes that wraps N-gram Models and

provides train and forget operations for texts and files
provides flexible suggesting API for predicting next tokens

ModelRunner's aim is to build a pipeline with form

Input -> Lexer -> Vocabulary -> Model -> Vocabulary -> Reverse Lexer -> Output

so it requires 3 components:

LexerRunner
Vocabulary
Model

Providing custom components can help you customize ModelRunner for your own needs.

LocalGlobalRunner

LocalGlobalModelRunner is extension of ModelRunner which handles 2 different models: Local and Global.

We propose to use Local model in quickly changing contexts, like within a file.

On the contrary, we propose using Global in large static contexts like modules or projects.

Together they generate more balanced suggestion than they do individually.

LexerRunner

LexerRunner is the class that manages Lexer and implements lexing pipeline.

Example

val lexerRunner = LexerRunnerFactory.getLexerRunner(Language.JAVA)
val code = "for (int i = 0; i != 10; ++i) {"

println(lexerRunner.lexLine(code).toList())

will generate list

[<s>, for, (, int, i, =, 0, ;, i, !=, 10, ;, ++, i, ), {, </s>]

You can use LexerRunnerFactory to get predefined LexerRunner for implemented languages.

Vocabulary

Vocabulary is the class that translates tokens to numbers for model.

You will never interact with it directly but if you wish to manually control it's content you can save & load it with VocabularyRunner and pass already filled vocabulary to ModelRunner.

Model

Model is the interface every model must implement so they can be used by ModelRunner.

All our abstract models like NGramModel have static method standart which returns the generally best model in it's category.

If you want to mix your, for instance, neural network model with N-gram based, your model should implement Model interface and can be mixed with by MixModel

Extending API

If your language is not provided by SLP and you want to increase it's performance or appearance we propose you to to following steps:

Implement Lexer to have control over tokens extraction.
Implement CodeFilter to have control over output text appearance. This class translates tokens into text. Also, feel free to use some predefined filters from Filters
Add your language to Language and LexerRunnerFactory with it's file extensions.
Make a pull request.

Currently I working on removing need in pull request so can extend SLP directly on your project.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.idea		.idea
src/main		src/main
.gitignore		.gitignore
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Source Language Processing

Getting Started

Setting up the dependency

Gradle

Manually

First Model

Pivotal classes

ModelRunnerManager

ModelRunner

LocalGlobalRunner

LexerRunner

Vocabulary

Model

Extending API

About

Uh oh!

Releases 14

Packages

Languages

AndreyBychkov/SLP

Folders and files

Latest commit

History

Repository files navigation

Source Language Processing

Getting Started

Setting up the dependency

Gradle

Manually

First Model

Pivotal classes

ModelRunnerManager

ModelRunner

LocalGlobalRunner

LexerRunner

Vocabulary

Model

Extending API

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 14

Packages 0

Languages

Packages