prepepi: tools for cleaning and preparing epidemiological data #42
Replies: 0 comments 4 replies
-
I would really like to get a feel for pros/cons from the community. Thumbs up or down or comment welcome! |
Beta Was this translation helpful? Give feedback.
-
Note: I have thought of calling this 'epiclean' or 'cleanepi', but I think it is useful to have other basic tools in there for preparing the data (hashing algo), which makes it no just about cleaning. |
Beta Was this translation helpful? Give feedback.
-
Hey Thibaut, In our courses so far, we have had no problems sending students directly to {janitor} for column name cleaning (and they are using janitor anyway for tabyl() as their go-to function for quick tabulations). We have them use lubridate for many purposes, and for really messy dates we refer them to {parsedate}. Similar story for {matchmaker}. I don't have much experience with the hashing, so I won't comment on that. In the Epi R Handbook we use janitor::clean_names(). Due to the above, we're thinking to shift the Epi R Handbook messy dates to parsedate, and the dictionary-based cleaning to matchmaker, but haven't made the move yet. I think your pros and cons are well laid out. Right now I'd be on the side of just having good public health R user documentation/help for using these underlying packages. But if you can compile a large enough set of gaps not met by these packages, and since now there are more resources available to upkeep new packages... perhaps... happy to talk more |
Beta Was this translation helpful? Give feedback.
-
Hi Neale, |
Beta Was this translation helpful? Give feedback.
-
prepepi: tools for cleaning epidemiological data
Description
This package would provide tools for facilitating the cleaning and preparation of epidemiological data. It is mostly made of wrappers around existing tools and would re-implement several features of the old RECON package linelist, which was never finished nor released.
It would provide the following features:
Target audience
typical end-users: anyone having to clean up epidemiological data
potential contributors: same as the end-users; user feedback is likely to point to common use-cases which may result in new features
key collaborators: field epidemiologists; people with dirty data!
Interoperability
inputs: a
data.frame
(ortibble
) of dirty dataoutputs: a
data.frame
(ortibble
) of clean datarelated projects
Usage
The code below illustrates a typical use of the package, using fictitious code and outputs if needed:
Beta Was this translation helpful? Give feedback.
All reactions