@@ -7,7 +7,8 @@ Unfold: Rotate one field to many
7
7
8
8
The ``Unfold `` transform unfolds (pivots) a set of fields.
9
9
Simple unfolding consists of rotating a single input field into multiple output fields.
10
- This can be generalised to multiple input fields where the output fields are broken up into equal-sized groups,
10
+
11
+ This can be generalised to multiple input fields where the output fields are broken up into equal-sized *groups *,
11
12
and each group is generated from one of the input fields.
12
13
``Unfold `` is the inverse of :py:class: `Fold `.
13
14
@@ -21,73 +22,167 @@ Unfold: Rotate one field to many
21
22
22
23
The list of fields to be unfolded.
23
24
They will be dropped from the output, so use :py:class: `Copy ` to preserve them.
24
- Each input field contains the values for an entire output group.
25
+ The first field is the *tag * field and is used to identify wich element of the group the row belongs to.
26
+ Each subsequent input field contains the values for an entire group.
25
27
26
28
.. py :attribute :: outputs
27
29
:type: tuple(str)
28
30
29
- The output fields receiving the unfolded fields.
31
+ The output fields receiving the unfolded input fields.
30
32
The output fields are broken into equal-sized groups, one per input field.
31
33
The number of *inputs * must be an even multiple of the number of *outputs *.
32
34
They cannot overwrite existing fields, so use :py:class: `Drop ` to remove unwanted fields.
33
35
34
- Limitations
35
- ^^^^^^^^^^^
36
- The current implementation assumes that the unfolded values are contiguous.
37
- That is, all the input rows for a single output row will arrive sequentially and in order.
38
- This is the order generated by :py:class: `Fold `, so it is suggested that for now ``Unfold ``
39
- only be used to undo the actions of :py:class: `Fold `.
36
+ .. py :attribute :: tags
37
+ :type: dict(any,int)
38
+
39
+ The optional mapping from tag values to group positions.
40
+ If not provided, it will be generated sequentially from the values in the first record.
41
+
42
+ ``Unfold `` can rotate data where the output rows are generated from non-consecutive input rows.
43
+ To identify output rows, the remaining fields (called the *fixed * fields) are used as a key
44
+ for accumulating the values of a row.
45
+ When a row is complete, it is output.
46
+
47
+ Because the rows for an output field can appear at any point,
48
+ the *tags * are used to assign fields to output columns.
49
+ The first time a tag is seen, it is assigned to the next group position,
50
+ so the order of the tags in the first record must match the layout of the groups.
40
51
41
52
Usage
42
53
^^^^^
43
54
44
55
.. code-block :: python
45
56
46
- Unfold(p, (' Year' , ' Sales' ,), (' Sales 1992' , ' Sales 1993' , ' Sales 1994' ,))
47
- Unfold(p, (' Year' , ' Sales' , ' Profit' ,), (' Sales 1992' , ' Sales 1993' , ' Sales 1994' , ' Profit 1992' , ' Profit 1993' , ' Profit 1994' ,))
57
+ Unfold(p, (' Year' , ' Sales' ,),
58
+ (' Sales 1992' , ' Sales 1993' , ' Sales 1994' ,))
59
+ Unfold(p, (' Year' , ' Sales' , ' Profit' ,),
60
+ (' Sales 1992' , ' Sales 1993' , ' Sales 1994' ,
61
+ ' Profit 1992' , ' Profit 1993' , ' Profit 1994' ,))
48
62
49
63
Examples
50
64
^^^^^^^^
51
65
52
- Single Fold
53
- -----------
66
+ Single Group
67
+ ------------
68
+
69
+ The first Usage example is a case where a single measure (Sales) has been tagged by Year,
70
+ so that each Sales value is in a separate row:
54
71
55
72
.. csv-table :: Input
56
- :header: "Key ", "Year", "Sales"
73
+ :header: "Dept ", "Year", "Sales"
57
74
:align: left
58
75
59
- 0, 1992, "S-0-1992"
60
- 0, 1993, "S-0-1993"
61
- 0, 1994, "S-0-1994"
62
- 1, 1992, "S-1-1992"
63
- 1, 1993, "S-1-1993"
64
- 1, 1994, "S-1-1994"
76
+ Home, 1992, "S-H-1992"
77
+ Home, 1993, "S-H-1993"
78
+ Home, 1994, "S-H-1994"
79
+ Auto, 1992, "S-A-1992"
80
+ Auto, 1993, "S-A-1993"
81
+ Auto, 1994, "S-A-1994"
82
+
83
+ In order to have all the Sales values for a Dept in a single record,
84
+ the table needs to have all the Sales for that Dept rotated into the same row.
85
+ ``Unfold `` takes the tags and the field containing the values as its inputs
86
+ and the fields to rotate them to them in as the outputs.
87
+
88
+ The first *input * field is the "Tags" field, which contains the value used to
89
+ identify the original row.
90
+ In this example, this is the Year of the field.
91
+ This tag is used to track which group field an input row belongs to.
92
+ The tags are tracked in order, and they must have the same number as the inputs.
93
+
94
+ After Unfolding, each Sales value appears in a separate field, with the Year in the field name:
65
95
66
96
.. csv-table :: Output
67
- :header: "Key ", "Sales 1992", "Sales 1993", "Sales 1994"
97
+ :header: "Dept ", "Sales 1992", "Sales 1993", "Sales 1994"
68
98
:align: left
69
99
70
- 0, "S-0-1992", "S-0-1993", "S-0-1994"
71
- 1, "S-1-1992", "S-1-1993", "S-1-1994"
100
+ Home, "S-H-1992", "S-H-1993", "S-H-1994"
101
+ Auto, "S-A-1992", "S-A-1993", "S-A-1994"
102
+
103
+ Multiple Groups
104
+ ---------------
72
105
73
- Multiple Folds
74
- --------------
106
+ The second Usage example is a related case where multiple measures (Sales and Profit)
107
+ have been tagged by Year so that the Sales and Profits for each Year are in separate fields.
75
108
76
109
.. csv-table :: Input
77
- :header: "Key ", "Year", "Sales", "Profit"
110
+ :header: "Dept ", "Year", "Sales", "Profit"
78
111
:align: left
79
112
80
- 0, 1992, "S-0-1992", "P-0-1992"
81
- 0, 1993, "S-0-1993", "P-0-1993"
82
- 0, 1994, "S-0-1994", "P-0-1994"
83
- 1, 1992, "S-1-1992", "P-1-1992"
84
- 1, 1993, "S-1-1993", "P-1-1993"
85
- 1, 1994, "S-1-1994", "P-1-1994"
113
+ Home, 1992, "S-H-1992", "P-H-1992"
114
+ Home, 1993, "S-H-1993", "P-H-1993"
115
+ Home, 1994, "S-H-1994", "P-H-1994"
116
+ Auto, 1992, "S-A-1992", "P-A-1992"
117
+ Auto, 1993, "S-A-1993", "P-A-1993"
118
+ Auto, 1994, "S-A-1994", "P-A-1994"
119
+
120
+ In order to have all the Sales and Profit values for a Dept in a single record,
121
+ the table needs to have all the Sales and Profit values for that Dept rotated into the same row.
122
+ This means that there are two groups that need to be Unfolded: Sales and Profit,
123
+ and the value from each group needs to be rotated into the appropriate group field.
124
+
125
+ To express this, each group is listed in order in the *outputs *
126
+ and the *inputs * are mapped to the corresponding *tag * value and *output * field.
127
+ In this example, the Year is again the first *output * field,
128
+ and the following *output * fields are the groups in the order given by the *inputs *.
129
+
130
+ After Unfolding, each Sales and Profit value appears in a separate field:
86
131
87
132
.. csv-table :: Output
88
- :header: "Key ", "Sales 1992", "Sales 1993", "Sales 1994", "Profit 1992", "Profit 1993", "Profit 1994"
133
+ :header: "Dept ", "Sales 1992", "Sales 1993", "Sales 1994", "Profit 1992", "Profit 1993", "Profit 1994"
89
134
:align: left
90
135
:widths: 1, 8, 8, 8, 8, 8, 8
91
136
92
- 0, "S-0-1992", "S-0-1993", "S-0-1994", "P-0-1992", "P-0-1993", "P-0-1994"
93
- 1, "S-1-1992", "S-1-1993", "S-1-1994", "P-1-1992", "P-1-1993", "P-1-1994"
137
+ Home, "S-H-1992", "S-H-1993", "S-H-1994", "P-H-1992", "P-H-1993", "P-H-1994"
138
+ Auto, "S-A-1992", "S-A-1993", "S-A-1994", "P-A-1992", "P-A-1993", "P-A-1994"
139
+
140
+ Interleaved Records
141
+ -------------------
142
+
143
+ Another powerful use case for ``Unfold `` is to assemble records that may be interleaved.
144
+ In this example, the values of two fields appear mixed in the file, but identified by output Row and Column:
145
+
146
+ .. csv-table :: Input
147
+ :header: "Row", "Column", "Data"
148
+ :align: left
149
+
150
+ 0,0,"#BLENDs"
151
+ 1,0,5
152
+ 2,0,6
153
+ 3,0,7
154
+ 4,0,8
155
+ 5,0,9
156
+ 6,0,10
157
+ 7,0,"Total"
158
+ 0,1,"#Queries"
159
+ 1,1,1
160
+ 2,1,11
161
+ 3,1,85
162
+ 4,1,449
163
+ 5,1,1511
164
+ 6,1,9216
165
+ 7,1,11273
166
+
167
+ To assemble the rows, we Unfold the Data column into a single group,
168
+ using the Column field as the tags to identify the group field:
169
+
170
+ .. code-block :: python
171
+
172
+ Unfold(p, (' Column' , ' Data' ,), (' BLENDs' , ' #Queries' ,),
173
+ {' BLENDs' : 0 , ' #Queries' : 1 })
174
+
175
+ The result is a table containing the eight interleaved fields reassembled using the tags to identify the output group:
176
+
177
+ .. csv-table :: Input
178
+ :header: "Row", "#BLENDs", "#Queries"
179
+ :align: left
180
+
181
+ 0,#BLENDs,#Queries
182
+ 1,5,1
183
+ 2,6,11
184
+ 3,7,85
185
+ 4,8,449
186
+ 5,9,1511
187
+ 6,10,9216
188
+ 7,Total,11273
0 commit comments