Skip to content

Latest commit

 

History

History
67 lines (39 loc) · 3.92 KB

README.md

File metadata and controls

67 lines (39 loc) · 3.92 KB

wlasso

The goal of this repository is two-fold:

  • To put publicly available the R package wlasso. This package allows to fit linear and logistic regression models to complex survey data.
  • The R code of the simulation study that analyzes the performance of replicate weights' methods to define training and test sets to select optimal LASSO regression models is also available.

Note that the whole repository can be downloaded from Code > Download ZIP.

R package - wlasso

Warning

This package is now available in CRAN as svyVarSel.

The R package related to the paper is available in the folder wlasso. File usage-package.R provides an example of usage of the package functions. Data available in the folder example-data can be used as a toy example to play with the package.

This package depends on survey and glmnet packages.

Three functions are available in the package:

  • wlasso: This is the main function. This function allows us to fit LASSO prediction (linear or logistic) models to complex survey data, considering sampling weights in the estimation process and selecting the lambda that minimizes the error based on different replicate weights methods.
  • wlasso.plot: plots objects of class wlasso, indicating the estimated error of each lambda value and the number covariates of the model that minimizes the error.
  • replicate.weights: allows randomly defining training and test sets by means of the replicate weights' methods analyzed throughout the paper. The function wlasso depends on this function to define training and test sets. In particular, the methods that can be considered by means of this function are:
    • The ones that depend on the function as.svrepdesign from the survey package: Jackknife Repeated Replication (JKn), Bootstrap (bootstrap and subbootstrap) and Balanced Repeated Replication (BRR).
    • New proposals: Design-based cross-validation (dCV), split-sample repeated replication (split) and extrapolation (extrapolation).

Installation of the package in R

To install the package svyVarSel from CRAN:

install.packages("svyVarSel")

To install the updated version of the package from GitHub:

devtools::install_github("aiparragirre/svyVarSel")

Caution

The package available on this site is not the most updated version of the package. The current package was updated on 12/25/2023. Previous versions of the package are available in the old_versions folder. New versions of the package are available at svyVarSel. If, in any case, you prefer to install the package wlasso on this GitHub page (not recommended), please run the following code in R:

library("devtools")
install_github("aiparragirre/wlasso/wlasso")

R code of the simulation study

All the R code needed to reproduce the results obtained in the simulation study of the following paper is available in the folder R code - simulation study:

Iparragirre, A., Lumley, T., Barrio, I., & Arostegui, I. (2023). Variable selection with LASSO regression for complex survey data. Stat, 12(1), e578. https://doi.org/10.1002/sta4.578

In the folder Functions all the functions needed to conduct the simulation study are available.

We need to run the code in the file exe-sim.R to run the simulations. The results of these simulations can also be downloaded here.

If you want to reproduce the graphics, please save the results in the folder Results and run the file exe-results.R.

All the graphics shown in the paper are available in the folder Graphics and the numerical results in Tables.