Skip to content

Commit

Permalink
minor cleanup
Browse files Browse the repository at this point in the history
  • Loading branch information
rajitachandak committed Nov 13, 2024
1 parent 4dbcb0b commit 40babff
Show file tree
Hide file tree
Showing 8 changed files with 83 additions and 10 deletions.
2 changes: 1 addition & 1 deletion R/lpcde/DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: lpcde
Type: Package
Title: Boundary Adaptive Local Polynomial Conditional Density Estimator
Version: 0.1.4
Version: 0.1.5
Authors@R: person(given = "Rajita",
family = "Chandak",
role = c("aut", "cre"),
Expand Down
4 changes: 3 additions & 1 deletion R/lpcde/R/lpcde.R
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,9 @@
#' points will be chosen as 0.05-0.95 percentiles of the data, with a step size of 0.05 in
#' y-direction.
#' @param x Numeric, specifies the grid of evaluation points in the x-direction. When set to default,
#' the evaluation point will be chosen as the median of the x data.
#' the evaluation point will be chosen as the median of the x data. To generate
#' estimates for multiple conditioning values, please loop over the x values and
#' evaluate the lpcde function at each point.
#' @param bw Numeric, specifies the bandwidth used for estimation. Can be (1) a positive
#' scalar (common bandwidth for all grid points); or (2) a positive numeric vector/matrix
#' specifying bandwidths for each grid point (should be the same dimension as \code{grid}).
Expand Down
2 changes: 1 addition & 1 deletion R/lpcde/R/lpcde_methods.R
Original file line number Diff line number Diff line change
Expand Up @@ -266,7 +266,7 @@ summary.lpcde = function(object, ...){
#'
#' @export
coef.lpcde = function(object, ...) {
object$Estimate
object$Estimate[,c(1,3)]
}
#######################################################################################
#' Vcov method for local polynomial density conditional estimation
Expand Down
4 changes: 3 additions & 1 deletion R/lpcde/man/lpcde.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file removed R/lpcde_0.1.4.pdf
Binary file not shown.
Binary file added R/lpcde_0.1.5.pdf
Binary file not shown.
11 changes: 11 additions & 0 deletions paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -137,3 +137,14 @@ @article{rothfuss2019conditional
journal={arXiv:1903.00954},
year={2019}
}

@article{haldensify,
doi = {10.21105/joss.04522},
url = {https://doi.org/10.21105/joss.04522},
year = {2022},
publisher = {The Open Journal},
volume = {7}, number = {77}, pages = {4522},
author = {Nima S. Hejazi and Mark J. van der Laan and David Benkeser},
title = {`haldensify`: Highly adaptive lasso conditional density estimation in `R`},
journal = {Journal of Open Source Software}
}
70 changes: 64 additions & 6 deletions paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,21 +41,79 @@ affiliations:

# Summary

Conditional cumulative distribution functions (CDFs), conditional probability density functions (PDFs), and derivatives thereof, are important parameters of interest in statistics, econometrics, and other data science disciplines. The package `lpcde` implements new estimation and inference methods for conditional CDFs, conditional PDFs, and derivatives thereof, employing the kernel-based local polynomial smoothing approach introduced in @CCJM_2024_Bernoulli.
Conditional cumulative distribution functions (CDFs), conditional probability
density functions (PDFs), and derivatives thereof, are important parameters of
interest in statistics, econometrics, and other data science disciplines. The
package `lpcde` implements new estimation and inference methods for conditional
CDFs, conditional PDFs, and derivatives thereof, employing the kernel-based
local polynomial smoothing approach introduced in @CCJM_2024_Bernoulli.

The package `lpcde` offers data-driven (pointwise and uniform) estimation and inference methods for conditional CDFs, conditional PDFs, and derivatives thereof, which are automatically valid at both interior and boundary points of the support of the outcome and conditioning variables. For point estimation, the package offers mean squared error optimal bandwidth selection and associated optimal mean square and uniform point estimators. For inference, the package offers valid confidence intervals and confidence bands based on robust bias-correction techniques [@Calonico-Cattaneo-Farrell_2018_JASA; @Calonico-Cattaneo-Farrell_2022_Bernoulli]. Finally, these statistical procedures can be easily used for visualization and graphical presentation of smooth estimates of conditional CDFs, conditional PDFs, and derivative thereof, with custom `ggplot` [@ggplot2] commands built for the package.
The package `lpcde` offers data-driven (pointwise and uniform) estimation and
inference methods for conditional CDFs, conditional PDFs, and derivatives
thereof, which are automatically valid at both interior and boundary points of
the support of the outcome and conditioning variables. For point estimation, the
package offers mean squared error optimal bandwidth selection and associated
optimal mean square and uniform point estimators. For inference, the package
offers valid confidence intervals and confidence bands based on robust
bias-correction techniques [@Calonico-Cattaneo-Farrell_2018_JASA;
@Calonico-Cattaneo-Farrell_2022_Bernoulli]. Finally, these statistical
procedures can be easily used for visualization and graphical presentation of
smooth estimates of conditional CDFs, conditional PDFs, and derivative thereof,
with custom `ggplot` [@ggplot2] commands built for the package.

This package is currently the only open source implementation of an estimator offering boundary adaptive, data-driven conditional density estimation with robust bias-corrected pointwise confidence interval and uniform confidence band constructions, providing users with statistical tools to better understand the reliability of their empirical analysis. A detailed tutorial, replication files, and other information on how to use the package can be found in the [GitHub repository](https://github.com/nppackages/lpcde) and through the [CRAN repository](https://cran.r-project.org/web/packages/lpcde/index.html). See also the `lpcde` package website (https://nppackages.github.io/lpcde/) and the companion arXiv article [@CCJM_2024_lpcde] for additional methodological information and numerical results.
This package is currently the only open source implementation of an estimator
offering boundary adaptive, data-driven conditional density estimation with
robust bias-corrected pointwise confidence interval and uniform confidence band
constructions, providing users with statistical tools to better understand the
reliability of their empirical analysis. A detailed tutorial, replication files,
and other information on how to use the package can be found in the [GitHub
repository](https://github.com/nppackages/lpcde) and through the [CRAN
repository](https://cran.r-project.org/web/packages/lpcde/index.html). See also
the `lpcde` package website (https://nppackages.github.io/lpcde/) and the
companion arXiv article [@CCJM_2024_lpcde] for additional methodological
information and numerical results.

# Statement of need

@Wand-Jones_1995_Book, @Fan-Gijbels_1996_Book, @simonoff2012smoothing, and @scott2015multivariate give textbook introductions to kernel-based density and local polynomial estimation and inference methods. The core idea underlying the estimator implemented in `lpcde` is to use kernel-based local polynomial smoothing methods to construct an automatic boundary adaptive estimator for conditional CDFs, conditional PDFs, and derivatives thereof. The estimator implemented in this package consists of two steps. The first step estimates the conditional distribution function using standard local polynomial regression methods, and the second step applies local polynomial smoothing to the (non-smooth) local polynomial conditional CDF estimate from the first step to obtain a smooth estimate of the conditional CDF, conditional PDF, and derivatives thereof.
@Wand-Jones_1995_Book, @Fan-Gijbels_1996_Book, @simonoff2012smoothing, and
@scott2015multivariate give textbook introductions to kernel-based density and
local polynomial estimation and inference methods. The core idea underlying the
estimator implemented in `lpcde` is to use kernel-based local polynomial
smoothing methods to construct an automatic boundary adaptive estimator for
conditional CDFs, conditional PDFs, and derivatives thereof. The estimator
implemented in this package consists of two steps. The first step estimates the
conditional distribution function using standard local polynomial regression
methods, and the second step applies local polynomial smoothing to the
(non-smooth) local polynomial conditional CDF estimate from the first step to
obtain a smooth estimate of the conditional CDF, conditional PDF, and
derivatives thereof.

A distinct advantage of this estimation method over existing ones is its boundary adaptivity for a possibly unknown compact support of the data. Furthermore, the estimator has a simple closed form representation, which leads to easy and fast implementation. Unlike other boundary adaptive procedures, the estimation procedures implemented in the package `lpcde` do not require pre-processing of data, and thus avoid the challenges of hyper-parameter tuning: only one bandwidth parameter needs to be selected for implementation. See @CCJM_2024_Bernoulli and @CCJM_2024_lpcde for more details.
A distinct advantage of this estimation method over existing ones is its
boundary adaptivity for a possibly unknown compact support of the data.
Furthermore, the estimator has a simple closed form representation, which leads
to easy and fast implementation. Unlike other boundary adaptive procedures, the
estimation procedures implemented in the package `lpcde` do not require
pre-processing of data, and thus avoid the challenges of hyper-parameter tuning:
only one bandwidth parameter needs to be selected for implementation. See
@CCJM_2024_Bernoulli and @CCJM_2024_lpcde for more details.

# Comparing and contrasting existing toolsets

The package `lpcde` contributes to a small set of open source statistical software packages implementing estimation and inference methods for conditional CDF, conditional PDF, and derivatives thereof. More specifically, we identified two `R` packages, `hdrcde` [@hdrcde] and `np` [@np], and one `Python` package, `cde` [@rothfuss2019conditional], which provide related methodology. There are no open source `Stata` packages that implement comparable estimation and inference methods. The table below summarizes some of the main differences between those other packages and `lpcde`. Notably, `lpcde` is the only package available that provides both pointwise and uniform uncertainty quantification, in addition to producing boundary adaptive mean square and uniformly optimal point estimates via data-driven, optimal tuning parameter selection. Furthermore, the `lpcde` package produces proper conditional density estimates that are non-negative and integrate to one. These features are unique contributions of the package to the `R` toolkit and, more broadly, to the open source statistical community.
The package `lpcde` contributes to a small set of open source statistical
software packages implementing estimation and inference methods for conditional
CDF, conditional PDF, and derivatives thereof. More specifically, we identified
three `R` packages, `hdrcde` [@hdrcde], `haldensify` [@haldensify] and `np`
[@np], and one `Python` package, `cde` [@rothfuss2019conditional], which provide
related methodology. There are no open source `Stata` packages that implement
comparable estimation and inference methods. The table below summarizes some of
the main differences between those other packages and `lpcde`. Notably, `lpcde`
is the only package available that provides both pointwise and uniform
uncertainty quantification, in addition to producing boundary adaptive mean
square and uniformly optimal point estimates via data-driven, optimal tuning
parameter selection. Furthermore, the `lpcde` package produces proper
conditional density estimates that are non-negative and integrate to one. These
features are unique contributions of the package to the `R` toolkit and, more
broadly, to the open source statistical community.

| Package | Programming language | CDF/Derivative estimation | Regularized density | Valid at boundary | Standard error | Valid inference | Confidence bands | Bandwidth selection |
|--------|:------:|:------:|:------:|:------:|:------:|:------:|:------:|:------:|
Expand Down

0 comments on commit 40babff

Please sign in to comment.