forked from reconhub/linelist
-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
106 lines (66 loc) · 3.17 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
output: github_document
---
```{r setup, echo = FALSE, message = FALSE, results = 'hide'}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.width = 8,
fig.height = 6,
fig.path = "figs/",
echo = TRUE
)
```
# Welcome to the *linelist* package!
[![Travis build status](https://travis-ci.org/reconhub/linelist.svg?branch=master)](https://travis-ci.org/reconhub/linelist)
[![Codecov test coverage](https://codecov.io/gh/reconhub/linelist/branch/master/graph/badge.svg)](https://codecov.io/gh/reconhub/linelist?branch=master)
This package is dedicated to simplifying the cleaning and standardisation of
[line list](https://outbreaktools.ca/background/line-lists/) data. Considering a case line list `data.frame`, it aims to:
- standardise the variables names, replacing all non-ascii characters with their
closest latin equivalent, removing blank spaces and other separators,
enforcing lower case capitalisation, and using a single separator between
words
- standardise the labels used in all variables of type `character` and `factor`,
as above
- set `POSIXct` and `POSIXlt` to `Date` objects
- extract dates from a messy variable, automatically detecting formats, allowing
inconsistent formats, and dates flanked by other text
## Installing the package
To install the current stable, CRAN version of the package, type:
```{r install, eval = FALSE}
install.packages("linelist")
```
To benefit from the latest features and bug fixes, install the development, *github* version of the package using:
```{r install2, eval = FALSE}
devtools::install_github("reconhub/linelist")
```
Note that this requires the package *devtools* installed.
# Quick example
Let us consider a messy `data.frame` as an example:
```{r toy_data}
library(linelist)
example_data <- messy_data(10)
example_data
```
We then use the `clean_data()` command to get nice, clean data!
```{r clean_data}
clean_data(example_data, guess_dates = TRUE)
```
# What does it do?
Procedures to clean data, first and foremost aimed at `data.frame` formats,
include:
- `clean_data()`: the main function, taking a `data.frame` as input, and doing all
the variable names, internal labels, and date processing described above
- `clean_variable_names()`: like `clean_data`, but only the variable names
- `clean_variable_labels()`: like `clean_data`, but only the variable labels
- `clean_variable_spelling()`: provided with a dictionary, will correct the
spelling of values in a variable and can globally correct commonly mis-spelled
words.
- `clean_dates()`: like `clean_data`, but only the dates
- `guess_dates()`: find dates in various, unspecified formats in a messy
`character` vector
## Getting help online
Bug reports and feature requests should be posted on *github* using the [*issue*](http://github.com/reconhub/linelist/issues) system. All other questions should be posted on the **RECON forum**: <br>
[http://www.repidemicsconsortium.org/forum/](http://www.repidemicsconsortium.org/forum/)
Contributions are welcome via **pull requests**.
Please note that this project is released with a [Contributor Code of Conduct](CONDUCT.md). By participating in this project you agree to abide by its terms.