CrossR

The CrossR package (short for Cross-validation in R) is a set of functions for implementing cross-validation inside the R environment.

Contributors

Nazli Ozum Kafaee / @nazliozum
Daniel Raff / @raffrica
Shun Chi / @ShunChi100

Summary

Cross-validation is an important technique used in model selection and hyper-parameter optimization. Scores from cross-validation are a good estimation of test score of a predictive model in test data or new data as long as the IID assumption approximately holds in data. This package aims to provide a standardized pipeline for performing cross-validation for different modeling functions in R. In addition, summary statistics of the cross-validation results are provided for users.

Functions

Three main functions in CrossR:

train_test_split(): This function partitions data into k-fold and returns the partitioned objects. A random shuffling option is provided.
cross_validation(): This function performs k-fold cross validation using the partitioned data and a selected model. It returns the scores of each validation. Additional methods for corss validation will be implemented (such as "Leave-One-Out" if time allows).
summary_cv(): This function outputs summary statistics(mean, standard deviation, mode, median) of cross-validation scores.

Additionally, we've built a helper function that generates data for the above functions:

gen_data(): This function generates a list of X and y data that can be passed in to train_test_split() or cross_validation()

It can be used as follows:

data <- gen_data(100)
X <- data.frame(data[[1]])
y_vec <- data[[2]]
y <- data.frame(y = y_vec)

Installation and examples:

To install the package:

devtools::install_github("UBC-MDS/CrossR")

Given X (explanatory variable) and y (response variable) as dataframes/atomic vectors, one can split the data

library(CrossR)
split_data <- train_test_split(X, y, test_size = 0.25, random_state = 0, shuffle = TRUE)

# to assign split data into individual variables
X_train = split_data[[1]]
X_test = split_data[[2]]
y_train = split_data[[3]]
y_test = split_data[[4]]

To do cross-validation on X, y using the linear regression lm() model:

scores <- cross_validation(split_data[['X_train']], split_data[['y_train']])

To see the summary of scores:

summary_cv(scores)

Similar packages

Cross-validation can be implemented with the caret package in R. caret contains the function createDataPartition() to split the data and train_Control() to apply cross-validation with different methods depending on the method argument. We have observed that caret functions have some features that make the cross-validation process cumbersome. createDataPartition() splits the indices of the data which could be used later on to actually split the data into training and test data. This will be applied with one step using split_data() in CrossR.

License

MIT License

Contributing

This is an open source project. So feedback, suggestions and contributions are very welcome. For feedback and suggestions, please open an issue in this repo. If you are willing to contribute this package, please refer Contributing guidelines for details.

Name	Name	Last commit message	Last commit date
Latest commit raffrica Error with export Mar 18, 2018 f001c8b · Mar 18, 2018 History 125 Commits
R	R	Merge remote-tracking branch 'upstream/master'	Mar 18, 2018
man	man	Merge remote-tracking branch 'upstream/master'	Mar 18, 2018
tests	tests	Merge pull request #47 from ShunChi100/master	Mar 18, 2018
vignettes	vignettes	Additions to fix cross_valiation	Mar 18, 2018
.Rapp.history	.Rapp.history	Use devtools for Directory setup	Feb 15, 2018
.Rbuildignore	.Rbuildignore	Add coverage file	Mar 17, 2018
.gitignore	.gitignore	Add vignette	Mar 11, 2018
.travis.yml	.travis.yml	Add warnings_are_errors: false to .travis.yml	Mar 18, 2018
CONDUCT.md	CONDUCT.md	Separate text from citation	Feb 11, 2018
CONTRIBUTING.md	CONTRIBUTING.md	Improve Grammar + Readability	Feb 11, 2018
CrossR.Rproj	CrossR.Rproj	Add test files and cross_validation documentation	Mar 1, 2018
DESCRIPTION	DESCRIPTION	Update README (issue #30 ) + get gen_data to work (issue #31 )	Mar 18, 2018
LICENSE	LICENSE	LICENSE: add File	Feb 11, 2018
NAMESPACE	NAMESPACE	Error with export	Mar 18, 2018
README.md	README.md	Update README	Mar 18, 2018
codecov.yml	codecov.yml	Add coverage file	Mar 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CrossR

Contributors

Summary

Functions

Installation and examples:

Similar packages

License

Contributing

About

Releases 4

Packages

Contributors 3

Languages

License

UBC-MDS/CrossR

Folders and files

Latest commit

History

Repository files navigation

CrossR

Contributors

Summary

Functions

Installation and examples:

Similar packages

License

Contributing

About

Resources

License

Stars

Watchers

Forks

Releases 4

Packages 0

Contributors 3

Languages

Packages