Machine learning for code reviews! This is an MIT-licensed companion to our Inclusive Code Comments browser extension.
The goal of this project is to produce a machine learning model for classifying sentences for code reviews.
Some examples:
- "Looks good to me. Ship it!" -> OK!
- "There are test failures" -> OK!
- "You are a failure" -> BAD!
- "This code stinks" -> BAD!
We have the model in two formats, that is free to use in other projects:
MLModel.zip
- model in ML.NET formatmodel.onnx
- model in ONNX format
You can find these in the models.zip
artifact on the GitHub status
of a commit or pull request.
You can create them locally via:
dotnet run --project ml.net/InclusiveCodeReviews.Convert/InclusiveCodeReviews.Convert.csproj
...
=============== Saving the model ===============
The model is saved to inclusive-code-reviews-ml/bin/MLModel.zip
The model is saved to inclusive-code-reviews-ml/bin/model.onnx
See the InclusiveCodeReviews.ConsoleApp
for a C# example, general
usage:
var results = ConsumeModel.Predict(new ModelInput
{
Text = "Your text here."
});
var result = result.Prediction;
var score = result.Score[result.Prediction == "1" ? 1 : 0];
See onnxjs\tests\onnx.ts
for a TypeScript/JavaScript example,
general usage:
const session = await ort.InferenceSession.create('./model.onnx');
const results = await session.run({
text: new ort.Tensor(["Your text here."], [1,1]),
isnegative: new ort.Tensor([''], [1,1]),
importance: new ort.Tensor('float32', [''], [1,1]),
});
const result = results['PredictedLabel.output'].data[0];
const score = results['Score.output'].data[Number(result)];
Folder structure of this repo:
comments
: several.csv
files of code review commentsMaui
: desktop app for classifying data, see MLTrainer appml.net
: contains C# projects related to ML.NET usage, creating.zip
or.onnx
filesonnxjs
: JS test project for the.onnx
model
If you're here to update comments/classified.csv
,
we have a CI archive of the app, so you don't have to build it from
source.
Find a commit on GitHub, and click a status for either Windows or Mac:
Click on Summary
:
Scroll to the bottom, and pick either a Windows or Mac build:
The Windows build contains an MLTrainer.exe
inside, you can unzip
this somewhere and run it.
The Mac build contains an MLTrainer-1.0.pkg
inside. You can simply
install it and run /Applications/MLTrainer.app
afterward.
Note that you may need to bypass various signing prompts, as this app
is not digitally signed. Mac you will need to go to System Preferences > Security & Privacy
to run an unsigned .pkg
installer.
The model is meant to be passed individual sentences. If you need to classify paragraphs of text, it is recommended to split the text into sentences and match each against the model.
The model is trained with all GitHub handles removed and replaced with
@github
, you should apply this same replacement with your own input
text using the regex:
const regex:RegExp = /\B@([a-z0-9](?:-(?=[a-z0-9])|[a-z0-9]){0,38}(?<=[a-z0-9]))/gi;
const replaced = text.replace(regex, '@github');
Additionally, inline code blocks such as:
This is a sentence with a `CodeBlock();`. Nice?
Are replaced via #code
:
const regex:RegExp = /`+[^`]+`+/gi;
const replaced = text.replace(regex, '#code');
URLs are replaced with #url
:
const regex:RegExp = /\b(https?|ftp|file):\/\/[\-A-Za-z0-9+&@#\/%?=~_|!:,.;]*[\-A-Za-z0-9+&@#\/%=~_|]/gi;
const replaced = text.replace(regex, '#url');
Trailing punctuation should be replaced such as:
const regex:RegExp = /(\.|!|\?|;|:)+$/g;
const replaced = text.replace(regex, '');
Leading/trailing blank space should also be replaced:
const final = text.trim();
Install with:
dotnet tool install --global mlnet
And you can train, such as:
mlnet classification --dataset comments/classified.csv --label-col 1 --has-header true --train-time 10
This outputs a folder named SampleClassification
in the current
directory. It's not exactly the same output the ML.NET Model Builder
outputs, but this should also work on a Mac.
See the ML.NET docs for more info.
Run train.ps1
.
Note there are additional manual edits we would lose if running this again. You might discard some of the changes, or compare the diff.
Open ml.net\InclusiveCodeReviews.sln
in VS, and run InclusiveCodeReviews.Convert
project.
This will update InclusiveCodeReviews.Model\MLModel.zip
and onnxjs\model.onnx
in-place.
To test model.onnx
, run npm test
in the onnxjs
directory.