-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
0 parents
commit 7d16044
Showing
64 changed files
with
5,293 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
gem 'aws-sdk' | ||
gem 'guard' | ||
gem 'guard-aws-s3' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,15 @@ | ||
# narp | ||
|
||
narp is a program for scalable transformation of very large data sets. It does this by processing a | ||
DSL and then generating a HIVE program. | ||
|
||
# Usage | ||
|
||
|
||
# Dependencies | ||
|
||
- treetop | ||
- aws-sdk | ||
|
||
# Changelog | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
require 'bundler' | ||
Bundler::GemHelper.install_tasks |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
@basic | ||
Feature: Parse the definition basic elements of the Narp language | ||
In order to allow users to specify basic building elements | ||
As a developer | ||
I should be able to run this scenario to prove that the defintion is correctly interpretted | ||
|
||
@string | ||
Scenario Outline: Providing a string definition | ||
Given an input <input> | ||
When parsed by BasicG | ||
Then I have a String at the root | ||
And the value is <value> | ||
|
||
Examples: | ||
| input | value | | ||
| 3"blue" | blueblueblue | | ||
| 'blue\tyellow' | blue yellow | | ||
| 2'blue\tyellow' | blue yellowblue yellow | | ||
| x"73616D706C65" | sample | | ||
| "\x73amp\x6C\x65" | sample | | ||
| 'blue' | blue | | ||
| 'blue #3' | blue #3 | | ||
|
||
|
||
Scenario Outline: Providing a regular expression definition | ||
Given an input <input> | ||
When parsed by BasicG | ||
Then I have a Regex at the root | ||
And the regexp should match <value> with a value of <match> | ||
|
||
Examples: | ||
| input | value | match | | ||
| /\S+/ | blue | b | | ||
| /\l./ | blue | lu | | ||
| /[[:digit:]+]/ | go23bat | 23 | | ||
| /87[[:alpha:]]{2}/ | shax 87code | 87co | | ||
| /\S+\t\S/ | love it | love i | | ||
|
||
|
||
@current | ||
Scenario Outline: Providing a numeric definition | ||
Given an input <input> | ||
When parsed by BasicG | ||
Then I have a <class> at the root | ||
And the value is <value> | ||
|
||
Examples: | ||
| input | class | value | | ||
| 23 | OrdinalLiteral | 23 | | ||
| -23 | IntegerLiteral | -23 | | ||
| 23.33 | FloatLiteral | 23.33 | | ||
| 2,333 | EditedNumeric | 2333 | | ||
| 8,333.3 | EditedNumeric | 8333.3 | | ||
|
||
|
||
|
||
|
||
Scenario Outline: Providing an invalid regular expression definition should cause Parse Error | ||
Given an input <input> | ||
Then parsing by BasicG should raise ParseError | ||
|
||
Examples: | ||
| input | | ||
| /blue | | ||
| /a(/ | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,70 @@ | ||
@condition | ||
Feature: Parse the conditions | ||
In order to allow users to affect what gets generated by Narp | ||
As a developer | ||
I should be able to run this scenario to prove that the defintion is correctly interpretted | ||
|
||
# @current | ||
# Scenario: providing a numeric valued expression | ||
# Given an input /condition my_cond int6 < 10 | ||
# And the app has numeric fields int5,int6 | ||
# When parsed by ConditionG --v | ||
# Then the condition is called my_cond | ||
# And the prettied expression is "blue" match u | ||
|
||
Scenario Outline: providing an arithmetic expression | ||
Given an input /condition <name> <condition> | ||
When parsed by ConditionG | ||
Then the condition is called <name> | ||
And the hql is <hql> | ||
|
||
Examples: | ||
| name | condition | hql | | ||
| my_cond | 23+ 79= 102 | 23 + 79 = 102 | | ||
| my_cond | 79 lt 102 - (25+33) | 79 < 102 - (25 + 33) | | ||
| y_cond | 23+71eq94 | 23 + 71 = 94 | | ||
| b_cond | (23+71)* 23=94 | (23 + 71) * 23 = 94 | | ||
| c_cond | (23+71)*( 51+4)=94| (23 + 71) * (51 + 4) = 94 | | ||
| d_cond | (23+71)/( 51-4)+5 ne 94 | (23 + 71) / (51 - 4) + 5 != 94 | | ||
| e_cond | (23+71)/( 51-4)+5 < 94 | (23 + 71) / (51 - 4) + 5 < 94 | | ||
| f_cond | (23+71)/( 51-4)+5 ge 78-(24*93) | (23 + 71) / (51 - 4) + 5 >= 78 - (24 * 93) | | ||
| g_cond | (23+71) ge 78-(24*93) or 5 < 3 | (23 + 71) >= 78 - (24 * 93) OR 5 < 3 | | ||
| g_cond | ((23+71) ge 78 or 5 < 3) | ((23 + 71) >= 78 OR 5 < 3) | | ||
| i_cond | ((23+71) ge 78+15) or (5 < 3) | ((23 + 71) >= 78 + 15) OR (5 < 3) | | ||
| j_cond | (((23+71) ge 78+15) and (5 < 3)) | (((23 + 71) >= 78 + 15) AND (5 < 3)) | | ||
|
||
|
||
Scenario Outline: providing a character expression | ||
Given an input /condition <name> <condition> | ||
When parsed by ConditionG | ||
Then the condition is called <name> | ||
And the hql is <hql> | ||
|
||
Examples: | ||
| name | condition | hql | | ||
| my_cond | 'blue' nc 'green' | LOCATE('green', 'blue') = 0| | ||
| b_cond | 'blue '' goo ' mt 'green' | 'blue '' goo ' = 'green' | | ||
| c_cond | "blue "" goo " ct "green" | LOCATE('green', 'blue "" goo ') > 0 | | ||
| d_cond | "blue" mt /u/ | 'blue' RLIKE 'u' | | ||
|
||
|
||
@current | ||
Scenario Outline: Providing a character/numeric expression with field references | ||
Given an input /condition <name> <condition> | ||
And an existing app that is reinitialized | ||
And the app has numeric fields <numeric_field_list> | ||
And the app has character fields <character_field_list> | ||
When parsed by ConditionG | ||
Then the condition is called <name> | ||
And the hql is <hql> | ||
|
||
Examples: | ||
| name | condition |numeric_field_list | character_field_list | hql | | ||
| b_cond | int6 + 5 > 10 | int6, int9 | [] | lhs_int6 + 5 > 10| | ||
| c_cond | 5"blue" ct "green" or int6 < 10 | int6, int9 | [] | LOCATE('green', 'blueblueblueblueblue') > 0 OR lhs_int6 < 10| | ||
| d_cond | int6 < 10 AND 5"blue" ct "green" | int6, int9 | [] | lhs_int6 < 10 AND LOCATE('green', 'blueblueblueblueblue') > 0 | | ||
| e_cond | 'blue' mt 'green' AnD ch5 mt /ye/ | [] | ch5, ch6 | 'blue' = 'green' AND lhs_ch5 RLIKE 'ye' | | ||
| f_cond | "blue "" goo " mt /ue/ AND ch5 mt /\d+/ | [] | ch6, ch5 | 'blue "" goo ' RLIKE 'ue' AND lhs_ch5 RLIKE '\d+' | | ||
| g_cond | (3"blue" ct "green" aND int6 < 9) or cha5 mt /yE/i | int6, int9 | ch4,cha5,col1 | (LOCATE('green', 'blueblueblue') > 0 AND lhs_int6 < 9) OR LOWER(lhs_cha5) RLIKE 'ye'| | ||
| i_cond | cha5 = " " | int6, int9 | ch4,cha5,col1 | lhs_cha5 = ' ' | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,89 @@ | ||
@derived_field | ||
Feature: Parse the derived fields | ||
In order to allow users to affect what gets generated by Narp | ||
As a developer | ||
I should be able to run this scenario to prove that the defintion is correctly interpretted | ||
|
||
# Scenario: providing an character expression | ||
# Given an existing app that is reinitialized | ||
# And the app has numeric fields fn1, fn2, fn3 | ||
# And the app has character fields fc1, fc2 | ||
# And the app has conditions cond2, cond5 | ||
# And an input /derivedfield calc1 fn1 + 92.5 | ||
# When parsed by DerivedFieldG --verbose | ||
|
||
@current | ||
Scenario Outline: providing a character/numeric expression | ||
Given an input /derivedfield <expression> | ||
And an existing app that is reinitialized | ||
And the app has numeric fields fn1, fn2, fn3 | ||
And the app has character fields fc1, fc2 | ||
When parsed by DerivedFieldG | ||
Then the column expression is <column_expression> | ||
And the sequence is <sequence> | ||
|
||
Examples: | ||
| expression | column_expression | sequence | | ||
| calc1 fc1 | lhs_fc1 AS calc1 | null | | ||
| calc2 fc2 23 compress ascii | TRIM(CAST(lhs_fc2 AS VARCHAR(23))) AS calc2 | ascii | | ||
| calc2 fc2 character 28 compress ascii | TRIM(CAST(lhs_fc2 AS VARCHAR(28))) AS calc2 | ascii | | ||
| calc2 fc2 54 character compress ascii | TRIM(CAST(lhs_fc2 AS VARCHAR(54))) AS calc2 | ascii | | ||
| calc3 23,000 en 6 compress | TRIM(CAST(23000 AS VARCHAR(6))) AS calc3 | null | | ||
| calc4 23 uinteger 8 | CAST(23 AS VARCHAR(8)) AS calc4 | null | | ||
| calc5 92.5 float 4 | CAST(92.5 AS VARCHAR(4)) AS calc5 | null | | ||
| calc5 fn1 + 92.5 float 4 | CAST(lhs_fn1 + 92.5 AS VARCHAR(4)) AS calc5 | null | | ||
| calc6 13,292.5 extract /(\d+).+(\d+)/ '#1k' compress | TRIM(CONCAT('', REGEXP_EXTRACT(13292.5, '(\\\\\d+).+(\\\\\d+)', 1), 'k')) AS calc6 | null | | ||
| calc7 29,333.53 en 10 4/1 | CONCAT(CAST(SPLIT(29333.53, '\\.')[0] AS VARCHAR(4)), '.', CAST(SPLIT(29333.53, '\\.')[1] AS VARCHAR(1))) AS calc7 | null | | ||
| calc8 29,333.53 En 10 4 | CAST(SPLIT(29333.53, '\\.')[0] AS VARCHAR(4)) AS calc8 | null | | ||
|
||
|
||
Scenario Outline: providing a character regex | ||
Given an input /derivedfield <name> <expression> | ||
And an existing app that is reinitialized | ||
And the app has numeric fields fn1, fn2, fn3 | ||
And the app has character fields fc1, fc2 | ||
When parsed by DerivedFieldG | ||
Then the name is <name> | ||
And the column expression is <column_expression> | ||
|
||
Examples: | ||
| name | expression | column_expression | | ||
| calc1 | 'bluecheese' extract /(.+)cheese/i 'cheese: #1' truncate | RTRIM(CONCAT('cheese: ', REGEXP_EXTRACT(LOWER('bluecheese'), '(.+)cheese', 1))) AS calc1 | | ||
| calc2 | fc1 extract /(.+)chee(.+)/i 'cheese: #2; then #1' | CONCAT('cheese: ', REGEXP_EXTRACT(LOWER(lhs_fc1), '(.+)chee(.+)', 2), '; then ', REGEXP_EXTRACT(LOWER(lhs_fc1), '(.+)chee(.+)', 1)) AS calc2 | | ||
|
||
Scenario Outline: providing an if expression | ||
Given an input /derivedfield <name> <expression> | ||
And an existing app that is reinitialized | ||
And the app has numeric fields fn1, fn2, fn3 | ||
And the app has character fields fc1, fc2 | ||
And the app has conditions cond2, cond5, cond7 | ||
When parsed by DerivedFieldG | ||
Then the name is <name> | ||
And the column expression is <column_expression> | ||
|
||
Examples: | ||
| name | expression | column_expression | | ||
| calc1 | if cond2 then 25.3 else 56 | CASE WHEN _cond2_ THEN 25.3 ELSE 56 END AS calc1 | | ||
| calc2 | if cond2 then if cond5 then 22 + fn1 else 23 else 56 | CASE WHEN _cond2_ THEN CASE WHEN _cond5_ THEN 22 + lhs_fn1 ELSE 23 END ELSE 56 END AS calc2 | | ||
| calc3 | if cond2 then if cond5 then 22 + fn1 else if cond7 then 15+fn3 else 0 else 56 | CASE WHEN _cond2_ THEN CASE WHEN _cond5_ THEN 22 + lhs_fn1 ELSE CASE WHEN _cond7_ THEN 15 + lhs_fn3 ELSE 0 END END ELSE 56 END AS calc3 | | ||
|
||
|
||
Scenario Outline: A derived expression referencing another derived_expression | ||
Given an input /derivedfield <name> <expression> | ||
And an existing app that is reinitialized | ||
And the app has numeric fields fn1, fn2, fn3 | ||
And the app has character fields fc1, fc2 | ||
And the app has derived fields fd1, fd2 | ||
And the app has conditions cond2, cond5, cond7 | ||
When parsed by DerivedFieldG | ||
Then the name is <name> | ||
And the column expression is <column_expression> | ||
|
||
Examples: | ||
| name | expression | column_expression | | ||
| calc2 | fd1 + 25.3 + 56 + fd2 | (_fd1_) + 25.3 + 56 + (_fd2_) AS calc2 | | ||
| calc3 | fn1 + 25.3 + fd2 * 19 | lhs_fn1 + 25.3 + (_fd2_) * 19 AS calc3 | | ||
| calc4 | if cond2 then fd1 + 25.3 else 56 + fd2 | CASE WHEN _cond2_ THEN (_fd1_) + 25.3 ELSE 56 + (_fd2_) END AS calc4 | | ||
| calc5 | if cond2 then fn1 / 25.3 else 56 + fd2 | CASE WHEN _cond2_ THEN lhs_fn1 / 25.3 ELSE 56 + (_fd2_) END AS calc5 | | ||
| calc6 | fd1 compress | TRIM((_fd1_)) AS calc6 | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,132 @@ | ||
@fields | ||
Feature: Parse the definition for fields | ||
In order to allow users to specify fields in an input file | ||
As a developer | ||
I should be able to run this scenario to prove that the defintion is correctly interpretted | ||
|
||
Scenario Outline: providing a name, a fixed position and a data type | ||
Given an input /fields <name> <position> <data_type> <length> | ||
When parsed by FieldsG | ||
And I am examining the 1st field | ||
Then the name is <name> | ||
And the starting byte position is <byte_position> and offset is <offset> bits | ||
And the datatype is <data_type> <default_data_type> | ||
And it is <dt_length> bytes long | ||
|
||
Examples: | ||
| name | position | data_type |length |byte_position | offset | dt_length | default_data_type | | ||
| my_col | 23 | character | 15 | 23 | null | 15 | | | ||
| yourcol | 15B3 | integer | | 15 | 3 | null | | | ||
| his_col | 82B9 | float | | 82 | 9 | null | | | ||
| her_col | 82B9 | | | 82 | 9 | null | character | | ||
|
||
|
||
Scenario Outline: providing a name and a supported datetime datatype | ||
Given an input /fields my_col 92B3 <data_type> <format> | ||
When parsed by FieldsG | ||
And I am examining the 1st field | ||
Then I have a Field at the root | ||
And the datatype is <data_type> | ||
And it has these datetime pieces <pieces> | ||
|
||
Examples: | ||
|
||
| data_type | format |pieces | | ||
| datetime | year | year | | ||
| datetime | year/mon | year,mon | | ||
| datetime | yy-mm0-dd0 hh0:mi0:se0 | yy,mm0,dd0,hh0,mi0,se0 | | ||
| datetime | yy-mm-dd0 hh0:mi0:se0 | yy,mm,dd0,hh0,mi0,se0 | | ||
| datetime | yy-mnth | yy,mnth | | ||
| datetime | yy-mnth-ddth | yy,mnth,ddth | | ||
| datetime | yy-mnth-dd0 hh0:mi0:se0 | yy,mnth,dd0,hh0,mi0,se0 | | ||
| datetime | yy-mnth-dd hh0:mi0:se0 | yy,mnth,dd,hh0,mi0,se0 | | ||
| datetime | yy-mnth-day hh:mi0:se0 | yy,mnth,day,hh,mi0,se0 | | ||
| datetime | yy-mnth-day hr:mi:se0 | yy,mnth,day,hr,mi,se0 | | ||
| datetime | yy-mnth-day hr:mi:se | yy,mnth,day,hr,mi,se | | ||
|
||
Scenario Outline: providing a name and a delimited position | ||
Given an input /fields my_col <start> <stop> <format> | ||
When parsed by FieldsG | ||
And I am examining the 1st field | ||
Then the start field is <start_field> with byte offset of <start_offset> | ||
And the stop field is <stop_field> with byte offset of <stop_offset> | ||
And the datatype is <data_type> <default_data_type> | ||
And it has these datetime pieces <pieces> | ||
|
||
Examples: | ||
| start | stop | format |start_field | start_offset| stop_field | stop_offset | data_type | pieces | default_data_type | | ||
| 23:1 | | integer | 23 | 1 | null | null |integer | [] | | | ||
| 41: | | | 41 | null | null | null | | [] | character | | ||
| 41: | -72: | integer | 41 | null | 72 | null |integer | [] | | | ||
| 41: | - 72:15 | integer | 41 | null | 72 | 15 |integer | [] | | | ||
| 83:3 | - 22:0 | integer | 83 | 3 | 22 | 0 |integer | [] | | | ||
| 83:3 | -92:0 | character | 83 | 3 | 92 | 0 | character | [] | | | ||
| 83:3 | | datetime yy/mm-dd | 83 | 3 | null | null | datetime | yy,mm,dd | | | ||
| 96:1 | -101: | datetime yy/mm-dd0 hh | 96 | 1 | 101 | null | datetime | yy,mm,dd0,hh | | | ||
|
||
|
||
Scenario: providing a name and a precision | ||
Given an input /fields my_col 14:1 float 4 /1 | ||
When parsed by FieldsG | ||
And I am examining the 1st field | ||
Then I have a Field at the root | ||
And I have 1 Field | ||
And the name is my_col | ||
And the start field is 14 with byte offset of 1 | ||
And the datatype is float | ||
And the precision is 5 and the scale is 1 | ||
|
||
Scenario Outline: providing a name and a sequence | ||
Given an input /fields my_col 14:1 character <collation> | ||
And the app has collations <collation_list> | ||
When parsed by FieldsG | ||
And I am examining the 1st field | ||
Then the collation is <collation> | ||
|
||
Examples: | ||
| collation | collation_list | | ||
| ascii | [] | | ||
| myascii | yourascii,myascii | | ||
|
||
|
||
Scenario Outline: providing a name and a unsupported format | ||
Given an input /fields my_col 29b9 <format> | ||
When parsed by FieldsG | ||
And I am examining the 1st field | ||
Then the datatype should raise ArgumentError | ||
|
||
Examples: | ||
| format | | ||
| lz | | ||
| lp | | ||
| tp | | ||
| zd | | ||
| ls | | ||
| ts | | ||
| an | | ||
| pd | | ||
|
||
@current | ||
Scenario: Parsing two fields | ||
Given an input /fields My_col 91B3 your_col 25 | ||
When parsed by FieldsG | ||
And I am examining the 1st field | ||
Then the name is My_col | ||
And the starting byte position is 91 and offset is 3 bits | ||
And the datatype is character | ||
And I am examining the 2nd field | ||
Then the name is your_col | ||
And the starting byte position is 25 and offset is null bits | ||
And the datatype is character | ||
|
||
Scenario Outline: Things that should cause a parse error | ||
Given an input /fields <input> | ||
And the app has collations <collation_list> | ||
Then parsing by FieldsG should raise ParseError | ||
|
||
Examples: | ||
| input | collation_list | message | | ||
| my_col 14:1 character myascii | blueshoose | referencing unknown collation | | ||
| my_col 14:1 character myascii | [] | referencing unknown collation | | ||
|
||
|
Oops, something went wrong.