Dataflow Programming for Machine Learning in R.

Package website: release | dev

What is `mlr3pipelines`?

Watch our “WhyR 2020” Webinar Presentation on Youtube for an introduction! Find the slides here.

mlr3pipelines is a dataflow programming toolkit for machine learning in R utilising the mlr3 package. Machine learning workflows can be written as directed “Graphs” that represent data flows between preprocessing, model fitting, and ensemble learning units in an expressive and intuitive language. Using methods from the mlr3tuning package, it is even possible to simultaneously optimize parameters of multiple processing units.

In principle, mlr3pipelines is about defining singular data and model manipulation steps as “PipeOps”:

pca        = po("pca")
filter     = po("filter", filter = mlr3filters::flt("variance"), filter.frac = 0.5)
learner_po = po("learner", learner = lrn("classif.rpart"))

These pipeops can then be combined together to define machine learning pipelines. These can be wrapped in a GraphLearner that behave like any other Learner in mlr3.

graph = pca %>>% filter %>>% learner_po
glrn = GraphLearner$new(graph)

This learner can be used for resampling, benchmarking, and even tuning.

resample(tsk("iris"), glrn, rsmp("cv"))
#> <ResampleResult> of 10 iterations
#> * Task: iris
#> * Learner: pca.variance.classif.rpart
#> * Warnings: 0 in 0 iterations
#> * Errors: 0 in 0 iterations

Feature Overview

Single computational steps can be represented as so-called PipeOps, which can then be connected with directed edges in a Graph. The scope of mlr3pipelines is still growing; currently supported features are:

Simple data manipulation and preprocessing operations, e.g. PCA, feature filtering
Task subsampling for speed and outcome class imbalance handling
mlr3 Learner operations for prediction and stacking
Simultaneous path branching (data going both ways)
Alternative path branching (data going one specific way, controlled by hyperparameters)
Ensemble methods and aggregation of predictions

Documentation

The easiest way to get started is reading some of the vignettes that are shipped with the package, which can also be viewed online:

Quick Introduction, with short examples to get started

Bugs, Questions, Feedback

mlr3pipelines is a free and open source software project that encourages participation and feedback. If you have any issues, questions, suggestions or feedback, please do not hesitate to open an “issue” about it on the GitHub page!

In case of problems / bugs, it is often helpful if you provide a “minimum working example” that showcases the behaviour (but don’t worry about this if the bug is obvious).

Please understand that the resources of the project are limited: response may sometimes be delayed by a few days, and some feature suggestions may be rejected if they are deemed too tangential to the vision behind the project.

Similar Projects

A predecessor to this package is the mlrCPO-package, which works with mlr 2.x. Other packages that provide, to varying degree, some preprocessing functionality or machine learning domain specific language, are the caret package and the related recipes project, and the dplyr package.

Name		Name	Last commit message	Last commit date
Latest commit History 2,406 Commits
.github/workflows		.github/workflows
R		R
attic		attic
info		info
man		man
pkgdown		pkgdown
tests		tests
.Rbuildignore		.Rbuildignore
.editorconfig		.editorconfig
.gitignore		.gitignore
.ignore		.ignore
.lintr		.lintr
.travis.yml		.travis.yml
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
mlr3pipelines.Rproj		mlr3pipelines.Rproj
tic.R		tic.R

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dataflow Programming for Machine Learning in R.

What is `mlr3pipelines`?

Feature Overview

Documentation

Bugs, Questions, Feedback

Similar Projects

About

Uh oh!

Releases

Packages

Contributors 19

Uh oh!

Languages

License

tigthor/r-dataflow-programming

Folders and files

Latest commit

History

Repository files navigation

Dataflow Programming for Machine Learning in R.

What is mlr3pipelines?

Feature Overview

Documentation

Bugs, Questions, Feedback

Similar Projects

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 19

Uh oh!

Languages

What is `mlr3pipelines`?

Packages