Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

clean_spelling: allow multiple variables in the 'variable' column #40

Open
thibautjombart opened this issue Feb 20, 2019 · 4 comments
Open
Assignees
Labels
enhancement New feature or request

Comments

@thibautjombart
Copy link
Contributor

The dictionary-based cleaning could use something like:

from      to        variable
hopsital  hospital  location|structure_type
hopital   hospital  location|structure_type
hopsital  hospital  location|structure_type
feild     field     location
homw      home      location
maison    home      location
household home      location
<NA>      unknown   .all
.default  unknown   location|structure_type|sex|exposure

Where the field variable illustrates the following new features:

  1. | to list several variables
  2. .all as a wildcard meaning "all variables"

A way to implement the above is to treat entries in variable as regular expressions to be matched against column names, with an exception rule for .all.

@zkamvar zkamvar self-assigned this Feb 20, 2019
@zkamvar zkamvar added the enhancement New feature or request label Feb 20, 2019
@zkamvar
Copy link
Member

zkamvar commented Feb 20, 2019

the .all wildcard was named .global and it has already been implemented 😁

@patrickbarks
Copy link
Contributor

Hi Thibaut and Zhian,

I've implemented a .regex keyword for clean_variable_spelling() in my linelist branch, to allow matching multiple variables as Thibaut describes.

We initially went with a regex = TRUE argument, to treat all vars as regular expressions, but found it was cumbersome and inelegant to anchor all the variables for which we just wanted literal matches. So we switched to the .regex keyword approach, which has been working well in some of our linelist work at Epicentre.

Let me know if you're interested, and I can create a pull request.

@thibautjombart
Copy link
Contributor Author

Hi Patrick
that sounds great! PR most welcome, ideally with some new unit tests and an example in the doc of the function. Please also add yourself as a contributor in the DESCRIPTION file. But really cool to see contribs on this package, and to hear epicentre is using it :)

@zkamvar
Copy link
Member

zkamvar commented Oct 15, 2019

That makes sense! It also aligns with the .regex keyword in the clean_spelling() function, so go ahead with the PR and I'll have a looksee

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants