Library that explores source code to model it and make predictions using N-gram method.
Original idea and library lays here
You can do it in several ways
We will be using JitPack service
- Add JitPack to repositories
repositories {
maven { url "https://jitpack.io" }
}- Add library to compile dependencies
dependencies {
implementation "com.github.AndreyBychkov:SLP:x"
}and replace x with latest version number.
Download jar and source code from latest release
and use your favorite way to add it as a dependency.
val file = File("path/to/file.ext")
val manager = ModelRunnerManager()
val modelRunner = manager.getModelRunner(file.extension)
modelRunner.train(file)
val suggestion = modelRunner.getSuggestion("your source code")
println(suggestion)Here we train a model on specified file
and make a suggestion of next token for inputs like for ( or System.out.
ModelRunnerManager is a class that
- provides you Models for specified file's extension or Language.
- Contains and manages all your models.
- Provides IO operations for save & load itself and thus containing models
Example
val storeDirectory = File("path/to/dir")
ModelRunnerManager().apply {
load(storeDirectory)
getModelRunner(Language.JAVA).train("int i = 0;")
save(storeDirectory)
}ModelRunner and LocalGlobalModelRunner are classes that wraps N-gram Models and
- provides
trainandforgetoperations for texts and files - provides flexible suggesting API for predicting next tokens
ModelRunner's aim is to build a pipeline with form
Input -> Lexer -> Vocabulary -> Model -> Vocabulary -> Reverse Lexer -> Output
so it requires 3 components:
- LexerRunner
- Vocabulary
- Model
Providing custom components can help you customize ModelRunner for your own needs.
LocalGlobalModelRunner is extension of ModelRunner which handles 2 different models: Local and Global.
We propose to use Local model in quickly changing contexts, like within a file.
On the contrary, we propose using Global in large static contexts like modules or projects.
Together they generate more balanced suggestion than they do individually.
LexerRunner is the class that manages Lexer and implements lexing pipeline.
Example
val lexerRunner = LexerRunnerFactory.getLexerRunner(Language.JAVA)
val code = "for (int i = 0; i != 10; ++i) {"
println(lexerRunner.lexLine(code).toList())will generate list
[<s>, for, (, int, i, =, 0, ;, i, !=, 10, ;, ++, i, ), {, </s>]
You can use LexerRunnerFactory to get predefined LexerRunner for implemented languages.
Vocabulary is the class that translates tokens to numbers for model.
You will never interact with it directly but if you wish to manually control it's content you can save & load it with VocabularyRunner and pass already filled vocabulary to ModelRunner.
Model is the interface every model must implement so they can be used by ModelRunner.
All our abstract models like NGramModel
have static method standart which returns the generally best model in it's category.
If you want to mix your, for instance, neural network model with N-gram based,
your model should implement Model interface
and can be mixed with by MixModel
If your language is not provided by SLP and you want to increase it's performance or appearance we propose you to to following steps:
- Implement
Lexerto have control over tokens extraction. - Implement
CodeFilterto have control over output text appearance. This class translates tokens into text. Also, feel free to use some predefined filters fromFilters - Add your language to
LanguageandLexerRunnerFactorywith it's file extensions. - Make a pull request.
Currently I working on removing need in pull request so can extend SLP directly on your project.