Skip to content

R package to fit LASSO regression models to complex survey data

Notifications You must be signed in to change notification settings

aiparragirre/wlasso

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

wlasso

The goal of this repository is two-fold:

  • To put publicly available the R package wlasso. This package allows to fit linear and logistic regression models to complex survey data.
  • The R code of the simulation study that analyzes the performance of replicate weights' methods to define training and test sets to select optimal LASSO regression models is also available.

Note that the whole repository can be downloaded from Code > Download ZIP.

R package - wlasso

Warning

This package is now available in CRAN as svyVarSel.

The R package related to the paper is available in the folder wlasso. File usage-package.R provides an example of usage of the package functions. Data available in the folder example-data can be used as a toy example to play with the package.

This package depends on survey and glmnet packages.

Three functions are available in the package:

  • wlasso: This is the main function. This function allows us to fit LASSO prediction (linear or logistic) models to complex survey data, considering sampling weights in the estimation process and selecting the lambda that minimizes the error based on different replicate weights methods.
  • wlasso.plot: plots objects of class wlasso, indicating the estimated error of each lambda value and the number covariates of the model that minimizes the error.
  • replicate.weights: allows randomly defining training and test sets by means of the replicate weights' methods analyzed throughout the paper. The function wlasso depends on this function to define training and test sets. In particular, the methods that can be considered by means of this function are:
    • The ones that depend on the function as.svrepdesign from the survey package: Jackknife Repeated Replication (JKn), Bootstrap (bootstrap and subbootstrap) and Balanced Repeated Replication (BRR).
    • New proposals: Design-based cross-validation (dCV), split-sample repeated replication (split) and extrapolation (extrapolation).

Installation of the package in R

To install the package svyVarSel from CRAN:

install.packages("svyVarSel")

To install the updated version of the package from GitHub:

devtools::install_github("aiparragirre/svyVarSel")

Caution

The package available on this site is not the most updated version of the package. The current package was updated on 12/25/2023. Previous versions of the package are available in the old_versions folder. New versions of the package are available at svyVarSel. If, in any case, you prefer to install the package wlasso on this GitHub page (not recommended), please run the following code in R:

library("devtools")
install_github("aiparragirre/wlasso/wlasso")

R code of the simulation study

All the R code needed to reproduce the results obtained in the simulation study of the following paper is available in the folder R code - simulation study:

Iparragirre, A., Lumley, T., Barrio, I., & Arostegui, I. (2023). Variable selection with LASSO regression for complex survey data. Stat, 12(1), e578. https://doi.org/10.1002/sta4.578

In the folder Functions all the functions needed to conduct the simulation study are available.

We need to run the code in the file exe-sim.R to run the simulations. The results of these simulations can also be downloaded here.

If you want to reproduce the graphics, please save the results in the folder Results and run the file exe-results.R.

All the graphics shown in the paper are available in the folder Graphics and the numerical results in Tables.

About

R package to fit LASSO regression models to complex survey data

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages