The SPLIT-TO-ROWS directive splits a column based on a separator into multiple records, making a copy of the record for each column value generated by splitting.
split-to-rows <column> <separator>
The <column>
is split based on the <separator>
, which can be defined as a regular
expression (regex).
The SPLIT-TO-ROW directive takes a column, applies the regular expression separator, and then creates multiple rows from the split.
Regular expressions allows the use of complex search patterns when splitting the data in the column. It supports standard Java regular expression constructs.
If the <separator>
regular expression pattern does not match any part of the input string,
then no additional rows are generated.
Where there is a match, and the match splits the input string into n strings, then n rows created by copying other columns will be generated.
Note: This directive can only operate on columns of type string.
If we have a <separator>
pattern of \n
(a linefeed) over the string:
This will be split 1\nThis will be split 2
This will generate two output records with the column
having these values:
[
{ "column": "This will be split 1" },
{ "column": "This will be split 2" }
]
Using this record as an example:
{
"id": 1,
"codes": "USD|AUD|AMD|XCD",
}
Applying this directive:
split-to-rows codes \|
Note: A backslash is required to escape the pipe character (|
) as it is an optional
separator in a regex pattern.
This would result in four records being generated, with each split value being assigned to
the column codes
:
[
{ "id": 1, "codes": "USD" },
{ "id": 1, "codes": "AUD" },
{ "id": 1, "codes": "AMD" },
{ "id": 1, "codes": "XCD" }
]