-
Notifications
You must be signed in to change notification settings - Fork 3
/
Copy pathREADME.Rmd
135 lines (104 loc) · 4.36 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, echo = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "README-"
)
```
# sundry
[![Travis-CI Build Status](https://travis-ci.org/datalorax/sundry.svg?branch=master)](https://travis-ci.org/datalorax/sundry)
[![AppVeyor Build Status](https://ci.appveyor.com/api/projects/status/4bmb4eqscdf3p6vb?svg=true)](https://ci.appveyor.com/project/DJAnderson07/sundry)
[![codecov](https://codecov.io/gh/datalorax/sundry/branch/master/graph/badge.svg)](https://codecov.io/gh/datalorax/sundry)
The *sundry* package is a personal R package filled with functions that make my
life a little easier when working on day-to-day analyses. Most of the functions
in the package are designed to be friendly with the
[tidyverse](https://www.tidyverse.org) and thus are pipe friendly (`%>%`), and
work with other functions like `dplyr::group_by`. The package is somewhat perpetually under development.
## Installation
You can install sundry from github with:
```{r gh-installation, eval = FALSE}
# install.packages("devtools")
devtools::install_github("datalorax/sundry")
```
## Examples
Below are a few examples of functions in the package.
### Batch import data and bind into a single data frame
Maybe my favorite function in the package is the `read_files` function, which
will read in *n* datasets and, if possible bind them together into a single
tibble (data frame). The function uses `rio::import`, which makes it really
nice because you don't have to worry about file types basically at all, and
you can even read in data of different types all at once.
```{r read_files}
library(sundry)
library(tidyverse)
by_species <- iris %>%
split(.$Species) %>%
map(select, -Species)
str(by_species)
# export as three different file types
rio::export(by_species$setosa, "setosa.csv")
rio::export(by_species$versicolor, "versicolor.xlsx")
rio::export(by_species$virginica, "virginica.sav")
# import them all back in as a single data frame
d <- read_files()
d
d %>%
count(file)
fs::file_delete(c("setosa.csv", "versicolor.xlsx", "virginica.sav"))
```
The first argument is the directory, and defaults to the current working
directory. There's also an optional `pat` argument you can supply to read in
only files with a specified pattern. You can also optionally have the files
read in as a list, rather than binding the data frames together.
### Quickly calculate descriptive stats
Quickly calculate descriptive stats for any set of variables.
```{r descriptive_stats, message = FALSE}
library(dplyr)
library(sundry)
storms %>%
descrips(wind, pressure)
storms %>%
group_by(year) %>%
descrips(wind, pressure)
storms %>%
group_by(year) %>%
descrips(wind, pressure,
.funs = funs(qtile25 = quantile(., 0.25),
median,
qtile75 = quantile(., 0.75)))
```
### Remove empty rows from specific columns
This function is similar to `janitor::remove_empty_rows`, but allows you to
pass a set of columns (rather than looking across all columns). Rows will be
removed that are missing across all columns.
```{r load_data}
d <- rio::import("http://www.oregon.gov/ode/educator-resources/assessment/TestResults2017/pagr_schools_ela_tot_ecd_ext_gnd_lep_1617.xlsx",
setclass = "tbl_df",
na = c("--", "*")) %>%
janitor::clean_names()
d %>%
select(district_id, number_level_4:percent_level_1)
```
The data above have missing data across many columns, but every row has at
least some valid entries. Suppose I was only interested in data on schools with
proficiency data.
```{r label, options}
d %>%
rm_empty_rows(number_level_4:percent_level_1) %>%
select(district_id, number_level_4:percent_level_1)
```
The above returns all the rows that are not missing across the full set of variables supplied (rows with partial missing are still returned). The function can also be provided without any column arguments, and the function will then mimic the behavior or `janitor::remove_empty_rows`.
### Filter by Functions
Select rows according to functions. For example, select only the rows with the minimum and maximum values of a specific variable.
```{r filter_by_funs}
storms %>%
filter_by_funs(wind, funs(min, max))
storms %>%
group_by(year) %>%
filter_by_funs(wind, funs(min, median, max)) %>%
arrange(year)
```