Quick links

Explore, label, and monitor data for AI projects

Rubrix is a free and open-source tool for exploring and iterating on data for artificial intelligence projects.

Rubrix focuses on enabling novel, human in the loop workflows involving data scientists, subject matter experts and ML/data engineers.

With Rubrix, you can:

Monitor the predictions of deployed models.
Label data with a novel search-guided, iterative workflow.
Iterate on ground-truth and predictions to debug, track and improve your data and models over time.
Build custom dashboards on top of your model predictions and labels.

Rubrix is composed of:

a Python library to bridge data and models, which you can install via pip.
a web application to explore and label data, which you can launch using Docker or directly with Python.

This is an example of Rubrix's labeling mode:

And this is an example for logging model predictions from a 🤗 transformers text classification pipeline:

from datasets import load_dataset
import rubrix as rb

model = pipeline('zero-shot-classification', model="typeform/distilbert-base-uncased-mnli")

dataset = load_dataset("ag_news", split='test[0:100]')

# Our labels are: ['World', 'Sports', 'Business', 'Sci/Tech']
labels = dataset.features["label"].names

for record in dataset:
    prediction = model(record['text'], labels)

    item = rb.TextClassificationRecord(
        inputs={"text": record["text"]},
        prediction=list(zip(prediction['labels'], prediction['scores'])),
        annotation=labels[record["label"]]
    )

    rb.log(item, name="ag_news_zeroshot")

Quick links

Doc	Description
🚶 First steps	New to Rubrix and want to get started?
👩‍🏫 Concepts	Want to know more about Rubrix concepts?
🛠️ Setup and install	How to configure and install Rubrix
🗒️ Tasks	What can you use Rubrix for?
📱 UI reference	How to use the web-app for data exploration and annotation
🐍 Python API docs	How to use the Python classes and methods
👩‍🍳 Rubrix cookbook	How to use Rubrix with your favourite libraries (`flair`, `stanza`...)
👋 Community forum	Ask questions, share feedback, ideas and suggestions
🤗 Hugging Face tutorial	Using Rubrix with 🤗`transformers` and `datasets`
💫 spaCy tutorial	Using `spaCy` with Rubrix for NER projects
🐠 Weak supervision tutorial	How to leverage weak supervision with `snorkel` & Rubrix
🤔 Active learning tutorial	How to use active learning with `modAL` & Rubrix
🧪 Knowledge graph tutorial	How to use Rubrix with `kglab` & `pytorch_geometric`

Get started

To get started you need to follow three steps:

Install the Python client
Launch the web app
Start logging data

1. Install the Python client

You can install the Python client with pip:

pip install rubrix

2. Launch the webapp

There are two ways to launch the webapp:

Using docker-compose (recommended).
Executing the server code manually

Using docker-compose (recommended)

Create a folder:

mkdir rubrix && cd rubrix

and launch the docker-contained web app with the following command:

wget -O docker-compose.yml https://git.io/rb-docker && docker-compose up

This is the recommended way because it automatically includes an Elasticsearch instance, Rubrix's main persistence layer.

Executing the server code manually

When executing the server code manually you need to provide an Elasticsearch instance yourself.

First you need to install Elasticsearch (we recommend version 7.10) and launch an Elasticsearch instance. For MacOS and Windows there are Homebrew formulae and a msi package, respectively.
Install the Rubrix Python library together with its server dependencies:

pip install rubrix[server]

Launch a local instance of the Rubrix web app

python -m rubrix.server

By default, the Rubrix server will look for your Elasticsearch endpoint at http://localhost:9200. If you want to customize this, you can set the ELASTICSEARCH environment variable pointing to your endpoint.

3. Start logging data

The following code will log one record into the example-dataset dataset:

import rubrix as rb

rb.log(
    rb.TextClassificationRecord(inputs="my first rubrix example"),
    name='example-dataset'
)

BulkResponse(dataset='example-dataset', processed=1, failed=0)

If you go to your Rubrix app at http://localhost:6900/, you should see your first dataset.

Congratulations! You are ready to start working with Rubrix with your own data.

To better understand what's possible take a look at Rubrix's Cookbook

Community

As a new open-source project, we are eager to hear your thoughts, fix bugs, and help you get started. Feel free to use the Discussion forum or the Issues and we'll be pleased to help out.

Name		Name	Last commit message	Last commit date
Latest commit History 299 Commits
.github/workflows		.github/workflows
docs		docs
frontend		frontend
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.readthedocs.yaml		.readthedocs.yaml
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
LICENSE		LICENSE
README.md		README.md
dev.Dockerfile		dev.Dockerfile
docker-compose.yaml		docker-compose.yaml
environment_dev.yml		environment_dev.yml
pyproject.toml		pyproject.toml
release.Dockerfile		release.Dockerfile
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Explore, label, and monitor data for AI projects

Quick links

Get started

1. Install the Python client

2. Launch the webapp

Using docker-compose (recommended)

Executing the server code manually

3. Start logging data

Community

About

Releases

Packages

Languages

License

knowledgeextraction/rubrix

Folders and files

Latest commit

History

Repository files navigation

Explore, label, and monitor data for AI projects

Quick links

Get started

1. Install the Python client

2. Launch the webapp

Using docker-compose (recommended)

Executing the server code manually

3. Start logging data

Community

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages