diff --git a/R/lpcde/DESCRIPTION b/R/lpcde/DESCRIPTION index cc8cb83..c3e3a49 100644 --- a/R/lpcde/DESCRIPTION +++ b/R/lpcde/DESCRIPTION @@ -1,7 +1,7 @@ Package: lpcde Type: Package Title: Boundary Adaptive Local Polynomial Conditional Density Estimator -Version: 0.1.4 +Version: 0.1.5 Authors@R: person(given = "Rajita", family = "Chandak", role = c("aut", "cre"), diff --git a/R/lpcde/R/lpcde.R b/R/lpcde/R/lpcde.R index d12eedf..f69f98d 100644 --- a/R/lpcde/R/lpcde.R +++ b/R/lpcde/R/lpcde.R @@ -14,7 +14,9 @@ #' points will be chosen as 0.05-0.95 percentiles of the data, with a step size of 0.05 in #' y-direction. #' @param x Numeric, specifies the grid of evaluation points in the x-direction. When set to default, -#' the evaluation point will be chosen as the median of the x data. +#' the evaluation point will be chosen as the median of the x data. To generate +#' estimates for multiple conditioning values, please loop over the x values and +#' evaluate the lpcde function at each point. #' @param bw Numeric, specifies the bandwidth used for estimation. Can be (1) a positive #' scalar (common bandwidth for all grid points); or (2) a positive numeric vector/matrix #' specifying bandwidths for each grid point (should be the same dimension as \code{grid}). diff --git a/R/lpcde/R/lpcde_methods.R b/R/lpcde/R/lpcde_methods.R index 7f40f28..1bda874 100644 --- a/R/lpcde/R/lpcde_methods.R +++ b/R/lpcde/R/lpcde_methods.R @@ -266,7 +266,7 @@ summary.lpcde = function(object, ...){ #' #' @export coef.lpcde = function(object, ...) { - object$Estimate + object$Estimate[,c(1,3)] } ####################################################################################### #' Vcov method for local polynomial density conditional estimation diff --git a/R/lpcde/man/lpcde.Rd b/R/lpcde/man/lpcde.Rd index 86f2500..84ed5fd 100644 --- a/R/lpcde/man/lpcde.Rd +++ b/R/lpcde/man/lpcde.Rd @@ -36,7 +36,9 @@ points will be chosen as 0.05-0.95 percentiles of the data, with a step size of y-direction.} \item{x}{Numeric, specifies the grid of evaluation points in the x-direction. When set to default, -the evaluation point will be chosen as the median of the x data.} +the evaluation point will be chosen as the median of the x data. To generate +estimates for multiple conditioning values, please loop over the x values and +evaluate the lpcde function at each point.} \item{bw}{Numeric, specifies the bandwidth used for estimation. Can be (1) a positive scalar (common bandwidth for all grid points); or (2) a positive numeric vector/matrix diff --git a/R/lpcde_0.1.4.pdf b/R/lpcde_0.1.4.pdf deleted file mode 100644 index cfa26ad..0000000 Binary files a/R/lpcde_0.1.4.pdf and /dev/null differ diff --git a/R/lpcde_0.1.5.pdf b/R/lpcde_0.1.5.pdf new file mode 100644 index 0000000..750faf7 Binary files /dev/null and b/R/lpcde_0.1.5.pdf differ diff --git a/paper.bib b/paper.bib index e9aed41..d58c927 100644 --- a/paper.bib +++ b/paper.bib @@ -137,3 +137,14 @@ @article{rothfuss2019conditional journal={arXiv:1903.00954}, year={2019} } + +@article{haldensify, +doi = {10.21105/joss.04522}, +url = {https://doi.org/10.21105/joss.04522}, +year = {2022}, +publisher = {The Open Journal}, +volume = {7}, number = {77}, pages = {4522}, +author = {Nima S. Hejazi and Mark J. van der Laan and David Benkeser}, +title = {`haldensify`: Highly adaptive lasso conditional density estimation in `R`}, +journal = {Journal of Open Source Software} +} diff --git a/paper.md b/paper.md index ec28555..524903a 100644 --- a/paper.md +++ b/paper.md @@ -41,21 +41,79 @@ affiliations: # Summary -Conditional cumulative distribution functions (CDFs), conditional probability density functions (PDFs), and derivatives thereof, are important parameters of interest in statistics, econometrics, and other data science disciplines. The package `lpcde` implements new estimation and inference methods for conditional CDFs, conditional PDFs, and derivatives thereof, employing the kernel-based local polynomial smoothing approach introduced in @CCJM_2024_Bernoulli. +Conditional cumulative distribution functions (CDFs), conditional probability +density functions (PDFs), and derivatives thereof, are important parameters of +interest in statistics, econometrics, and other data science disciplines. The +package `lpcde` implements new estimation and inference methods for conditional +CDFs, conditional PDFs, and derivatives thereof, employing the kernel-based +local polynomial smoothing approach introduced in @CCJM_2024_Bernoulli. -The package `lpcde` offers data-driven (pointwise and uniform) estimation and inference methods for conditional CDFs, conditional PDFs, and derivatives thereof, which are automatically valid at both interior and boundary points of the support of the outcome and conditioning variables. For point estimation, the package offers mean squared error optimal bandwidth selection and associated optimal mean square and uniform point estimators. For inference, the package offers valid confidence intervals and confidence bands based on robust bias-correction techniques [@Calonico-Cattaneo-Farrell_2018_JASA; @Calonico-Cattaneo-Farrell_2022_Bernoulli]. Finally, these statistical procedures can be easily used for visualization and graphical presentation of smooth estimates of conditional CDFs, conditional PDFs, and derivative thereof, with custom `ggplot` [@ggplot2] commands built for the package. +The package `lpcde` offers data-driven (pointwise and uniform) estimation and +inference methods for conditional CDFs, conditional PDFs, and derivatives +thereof, which are automatically valid at both interior and boundary points of +the support of the outcome and conditioning variables. For point estimation, the +package offers mean squared error optimal bandwidth selection and associated +optimal mean square and uniform point estimators. For inference, the package +offers valid confidence intervals and confidence bands based on robust +bias-correction techniques [@Calonico-Cattaneo-Farrell_2018_JASA; +@Calonico-Cattaneo-Farrell_2022_Bernoulli]. Finally, these statistical +procedures can be easily used for visualization and graphical presentation of +smooth estimates of conditional CDFs, conditional PDFs, and derivative thereof, +with custom `ggplot` [@ggplot2] commands built for the package. -This package is currently the only open source implementation of an estimator offering boundary adaptive, data-driven conditional density estimation with robust bias-corrected pointwise confidence interval and uniform confidence band constructions, providing users with statistical tools to better understand the reliability of their empirical analysis. A detailed tutorial, replication files, and other information on how to use the package can be found in the [GitHub repository](https://github.com/nppackages/lpcde) and through the [CRAN repository](https://cran.r-project.org/web/packages/lpcde/index.html). See also the `lpcde` package website (https://nppackages.github.io/lpcde/) and the companion arXiv article [@CCJM_2024_lpcde] for additional methodological information and numerical results. +This package is currently the only open source implementation of an estimator +offering boundary adaptive, data-driven conditional density estimation with +robust bias-corrected pointwise confidence interval and uniform confidence band +constructions, providing users with statistical tools to better understand the +reliability of their empirical analysis. A detailed tutorial, replication files, +and other information on how to use the package can be found in the [GitHub +repository](https://github.com/nppackages/lpcde) and through the [CRAN +repository](https://cran.r-project.org/web/packages/lpcde/index.html). See also +the `lpcde` package website (https://nppackages.github.io/lpcde/) and the +companion arXiv article [@CCJM_2024_lpcde] for additional methodological +information and numerical results. # Statement of need -@Wand-Jones_1995_Book, @Fan-Gijbels_1996_Book, @simonoff2012smoothing, and @scott2015multivariate give textbook introductions to kernel-based density and local polynomial estimation and inference methods. The core idea underlying the estimator implemented in `lpcde` is to use kernel-based local polynomial smoothing methods to construct an automatic boundary adaptive estimator for conditional CDFs, conditional PDFs, and derivatives thereof. The estimator implemented in this package consists of two steps. The first step estimates the conditional distribution function using standard local polynomial regression methods, and the second step applies local polynomial smoothing to the (non-smooth) local polynomial conditional CDF estimate from the first step to obtain a smooth estimate of the conditional CDF, conditional PDF, and derivatives thereof. +@Wand-Jones_1995_Book, @Fan-Gijbels_1996_Book, @simonoff2012smoothing, and +@scott2015multivariate give textbook introductions to kernel-based density and +local polynomial estimation and inference methods. The core idea underlying the +estimator implemented in `lpcde` is to use kernel-based local polynomial +smoothing methods to construct an automatic boundary adaptive estimator for +conditional CDFs, conditional PDFs, and derivatives thereof. The estimator +implemented in this package consists of two steps. The first step estimates the +conditional distribution function using standard local polynomial regression +methods, and the second step applies local polynomial smoothing to the +(non-smooth) local polynomial conditional CDF estimate from the first step to +obtain a smooth estimate of the conditional CDF, conditional PDF, and +derivatives thereof. -A distinct advantage of this estimation method over existing ones is its boundary adaptivity for a possibly unknown compact support of the data. Furthermore, the estimator has a simple closed form representation, which leads to easy and fast implementation. Unlike other boundary adaptive procedures, the estimation procedures implemented in the package `lpcde` do not require pre-processing of data, and thus avoid the challenges of hyper-parameter tuning: only one bandwidth parameter needs to be selected for implementation. See @CCJM_2024_Bernoulli and @CCJM_2024_lpcde for more details. +A distinct advantage of this estimation method over existing ones is its +boundary adaptivity for a possibly unknown compact support of the data. +Furthermore, the estimator has a simple closed form representation, which leads +to easy and fast implementation. Unlike other boundary adaptive procedures, the +estimation procedures implemented in the package `lpcde` do not require +pre-processing of data, and thus avoid the challenges of hyper-parameter tuning: +only one bandwidth parameter needs to be selected for implementation. See +@CCJM_2024_Bernoulli and @CCJM_2024_lpcde for more details. # Comparing and contrasting existing toolsets -The package `lpcde` contributes to a small set of open source statistical software packages implementing estimation and inference methods for conditional CDF, conditional PDF, and derivatives thereof. More specifically, we identified two `R` packages, `hdrcde` [@hdrcde] and `np` [@np], and one `Python` package, `cde` [@rothfuss2019conditional], which provide related methodology. There are no open source `Stata` packages that implement comparable estimation and inference methods. The table below summarizes some of the main differences between those other packages and `lpcde`. Notably, `lpcde` is the only package available that provides both pointwise and uniform uncertainty quantification, in addition to producing boundary adaptive mean square and uniformly optimal point estimates via data-driven, optimal tuning parameter selection. Furthermore, the `lpcde` package produces proper conditional density estimates that are non-negative and integrate to one. These features are unique contributions of the package to the `R` toolkit and, more broadly, to the open source statistical community. +The package `lpcde` contributes to a small set of open source statistical +software packages implementing estimation and inference methods for conditional +CDF, conditional PDF, and derivatives thereof. More specifically, we identified +three `R` packages, `hdrcde` [@hdrcde], `haldensify` [@haldensify] and `np` +[@np], and one `Python` package, `cde` [@rothfuss2019conditional], which provide +related methodology. There are no open source `Stata` packages that implement +comparable estimation and inference methods. The table below summarizes some of +the main differences between those other packages and `lpcde`. Notably, `lpcde` +is the only package available that provides both pointwise and uniform +uncertainty quantification, in addition to producing boundary adaptive mean +square and uniformly optimal point estimates via data-driven, optimal tuning +parameter selection. Furthermore, the `lpcde` package produces proper +conditional density estimates that are non-negative and integrate to one. These +features are unique contributions of the package to the `R` toolkit and, more +broadly, to the open source statistical community. | Package | Programming language | CDF/Derivative estimation | Regularized density | Valid at boundary | Standard error | Valid inference | Confidence bands | Bandwidth selection | |--------|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|