Skip to content

Expand lists filter

David Megginson edited this page Jun 14, 2020 · 4 revisions

The expand lists filter expands an in-cell list in a dataset by repeating the rest of the row for each value in the list (the examples below will make this clearer).

Usage

(If you omit the --tags option / patterns parameter, the filter will operate on every column with a +list attribute.)

Command line:

$ hxlexpand --correlate --tags '#adm1+name+list,#adm1+code+list' dataset.csv

hxl.model.Dataset helper method:

dataset.expand_lists(patterns="#adm1+name+list,#adm1+code+list', correlate=True)

Filter class:

filter = hxl.filters.ExpandListsFilter(
    dataset, 
    patterns="#adm1+name+list,#adm1+code+list',
    correlate=True)

Parameters

Argument Required? Description Example
Tag patterns No The Tag pattern matching the columns with lists to expand (if omitted, will select every column with a +list attribute) #sector+cluster+list, #org+list
Separator No The string separating list items in the cell value (defaults to "|") ,
Queries No Row queries to limit the rows where the filter will expand lists #org=UNICEF

Examples

Without correlation

Without correlation (the default), the filter will generate the Cartesian product of all the lists specified. You will generally want to do this when the lists are not related (independent values).

Input:

#adm1 #sector +list #org +list
Coast Province Health | Education Org A | Org B
Plains Province Nutrition Org A | Org C

Output:

#adm1 #sector #org
Coast Province Health Org A
Coast Province Health Org B
Coast Province Education Org A
Coast Province Education Org B
Plains Province Nutrition Org A
Plains Province Nutrition Org C

With correlation

With correlation (the --correlate command-line option, or the correlate=True Python parameter), the filter will match list values from multiple columns adding blank values if one list is longer than another. You generally want to do this when the lists are related (e.g. names and codes for the same things).

Input:

#adm1+name+list #adm1+code+list #sector
Coast Province, Plains Province 001, 002 Health
Coast Province, Mountains Province 001 Education

Output:

#adm1+name #adm1+code #sector
Coast Province 001 Health
Plains Province 002 Health
Coast Province 001 Education
Mountains Province Education

Note that the #adm1+code is blank in the last row, since there weren't enough list values in the original dataset.