Expand lists filter

The expand lists filter expands an in-cell list in a dataset by repeating the rest of the row for each value in the list (the examples below will make this clearer).

Usage

(If you omit the --tags option / patterns parameter, the filter will operate on every column with a +list attribute.)

Command line:

$ hxlexpand --correlate --tags '#adm1+name+list,#adm1+code+list' dataset.csv

hxl.model.Dataset helper method:

dataset.expand_lists(patterns="#adm1+name+list,#adm1+code+list', correlate=True)

Filter class:

filter = hxl.filters.ExpandListsFilter(
    dataset, 
    patterns="#adm1+name+list,#adm1+code+list',
    correlate=True)

Parameters

Argument	Required?	Description	Example
Tag patterns	No	The Tag pattern matching the columns with lists to expand (if omitted, will select every column with a +list attribute)	#sector+cluster+list, #org+list
Separator	No	The string separating list items in the cell value (defaults to "\|")	,
Queries	No	Row queries to limit the rows where the filter will expand lists	#org=UNICEF

Examples

Without correlation

Without correlation (the default), the filter will generate the Cartesian product of all the lists specified. You will generally want to do this when the lists are not related (independent values).

Input:

#adm1	#sector +list	#org +list
Coast Province	Health \| Education	Org A \| Org B
Plains Province	Nutrition	Org A \| Org C

Output:

#adm1	#sector	#org
Coast Province	Health	Org A
Coast Province	Health	Org B
Coast Province	Education	Org A
Coast Province	Education	Org B
Plains Province	Nutrition	Org A
Plains Province	Nutrition	Org C

With correlation

With correlation (the --correlate command-line option, or the correlate=True Python parameter), the filter will match list values from multiple columns adding blank values if one list is longer than another. You generally want to do this when the lists are related (e.g. names and codes for the same things).

Input:

#adm1+name+list	#adm1+code+list	#sector
Coast Province, Plains Province	001, 002	Health
Coast Province, Mountains Province	001	Education

Output:

#adm1+name	#adm1+code	#sector
Coast Province	001	Health
Plains Province	002	Health
Coast Province	001	Education
Mountains Province		Education

Note that the #adm1+code is blank in the last row, since there weren't enough list values in the original dataset.

Standard: http://hxlstandard.org | Mailing list: hxlproject@googlegroups.com

Home

For everyone
- Installation
- Command-line tools
For coders
Building blocks
HXL cookbook

Provide feedback

Saved searches