-
Notifications
You must be signed in to change notification settings - Fork 11
Expand lists filter
The expand lists filter expands an in-cell list in a dataset by repeating the rest of the row for each value in the list (the examples below will make this clearer).
(If you omit the --tags option / patterns parameter, the filter will operate on every column with a +list attribute.)
Command line:
$ hxlexpand --correlate --tags '#adm1+name+list,#adm1+code+list' dataset.csv
hxl.model.Dataset helper method:
dataset.expand_lists(patterns="#adm1+name+list,#adm1+code+list', correlate=True)
Filter class:
filter = hxl.filters.ExpandListsFilter(
dataset,
patterns="#adm1+name+list,#adm1+code+list',
correlate=True)
Argument | Required? | Description | Example |
---|---|---|---|
Tag patterns | No | The Tag pattern matching the columns with lists to expand (if omitted, will select every column with a +list attribute) | #sector+cluster+list, #org+list |
Separator | No | The string separating list items in the cell value (defaults to "|") | , |
Queries | No | Row queries to limit the rows where the filter will expand lists | #org=UNICEF |
Without correlation (the default), the filter will generate the Cartesian product of all the lists specified. You will generally want to do this when the lists are not related (independent values).
Input:
#adm1 | #sector +list | #org +list |
---|---|---|
Coast Province | Health | Education | Org A | Org B |
Plains Province | Nutrition | Org A | Org C |
Output:
#adm1 | #sector | #org |
---|---|---|
Coast Province | Health | Org A |
Coast Province | Health | Org B |
Coast Province | Education | Org A |
Coast Province | Education | Org B |
Plains Province | Nutrition | Org A |
Plains Province | Nutrition | Org C |
With correlation (the --correlate
command-line option, or the correlate=True
Python parameter), the filter will match list values from multiple columns adding blank values if one list is longer than another. You generally want to do this when the lists are related (e.g. names and codes for the same things).
Input:
#adm1+name+list | #adm1+code+list | #sector |
---|---|---|
Coast Province, Plains Province | 001, 002 | Health |
Coast Province, Mountains Province | 001 | Education |
Output:
#adm1+name | #adm1+code | #sector |
---|---|---|
Coast Province | 001 | Health |
Plains Province | 002 | Health |
Coast Province | 001 | Education |
Mountains Province | Education |
Note that the #adm1+code is blank in the last row, since there weren't enough list values in the original dataset.
Standard: http://hxlstandard.org | Mailing list: hxlproject@googlegroups.com