README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, echo = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "README-"
)
```

# sundry
[![Travis-CI Build Status](https://travis-ci.org/datalorax/sundry.svg?branch=master)](https://travis-ci.org/datalorax/sundry)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/4bmb4eqscdf3p6vb?svg=true)](https://ci.appveyor.com/project/DJAnderson07/sundry)
[![codecov](https://codecov.io/gh/datalorax/sundry/branch/master/graph/badge.svg)](https://codecov.io/gh/datalorax/sundry)

The *sundry* package is a personal R package filled with functions that make my
life a little easier when working on day-to-day analyses. Most of the functions
in the package are designed to be friendly with the 
[tidyverse](https://www.tidyverse.org) and thus are pipe friendly (`%>%`), and
work with other functions like `dplyr::group_by`. The package is somewhat perpetually under development. 

## Installation

You can install sundry from github with:

```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("datalorax/sundry")
```

## Examples

Below are a few examples of functions in the package.

### Batch import data and bind into a single data frame

Maybe my favorite function in the package is the `read_files` function, which 
will read in *n* datasets and, if possible bind them together into a single 
tibble (data frame). The function uses `rio::import`, which makes it really
nice because you don't have to worry about file types basically at all, and
you can even read in data of different types all at once.

```{r read_files}
library(sundry)
library(tidyverse)
by_species <- iris %>%
  split(.$Species) %>%
  map(select, -Species)

str(by_species)

# export as three different file types
rio::export(by_species$setosa, "setosa.csv")
rio::export(by_species$versicolor, "versicolor.xlsx")
rio::export(by_species$virginica, "virginica.sav")

# import them all back in as a single data frame
d <- read_files()
d
d %>% 
  count(file)

fs::file_delete(c("setosa.csv", "versicolor.xlsx", "virginica.sav"))
```

The first argument is the directory, and defaults to the current working 
directory. There's also an optional `pat` argument you can supply to read in 
only files with a specified pattern.  You can also optionally have the files 
read in as a list, rather than binding the data frames together.

### Quickly calculate descriptive stats

Quickly calculate descriptive stats for any set of variables.

```{r descriptive_stats, message = FALSE}
library(dplyr)
library(sundry)
storms %>% 
  descrips(wind, pressure)

storms %>% 
  group_by(year) %>% 
  descrips(wind, pressure)

storms %>% 
  group_by(year) %>% 
  descrips(wind, pressure,
           .funs = funs(qtile25 = quantile(., 0.25),
                        median, 
                        qtile75 = quantile(., 0.75)))
```

### Remove empty rows from specific columns
This function is similar to `janitor::remove_empty_rows`, but allows you to 
pass a set of columns (rather than looking across all columns). Rows will be
removed that are missing across all columns.

```{r load_data}
d <- rio::import("http://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_ela_tot_ecd_ext_gnd_lep_1617.xlsx",
            setclass = "tbl_df",
            na = c("--", "*")) %>% 
  janitor::clean_names()

d %>% 
  select(district_id, number_level_4:percent_level_1)
```

The data above have missing data across many columns, but every row has at 
least some valid entries. Suppose I was only interested in data on schools with 
proficiency data. 

```{r label, options}
d %>% 
  rm_empty_rows(number_level_4:percent_level_1) %>% 
  select(district_id, number_level_4:percent_level_1) 
```

The above returns all the rows that are not missing across the full set of variables supplied (rows with partial missing are still returned). The function can also be provided without any column arguments, and the function will then mimic the behavior or `janitor::remove_empty_rows`. 

### Filter by Functions
Select rows according to functions. For example, select only the rows with the minimum and maximum values of a specific variable.

```{r filter_by_funs}
storms %>%
 filter_by_funs(wind, funs(min, max))

storms %>%
  group_by(year) %>%
  filter_by_funs(wind, funs(min, median, max)) %>%
  arrange(year)
```