Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable time layout in epoch/epochMillis JQ functions #378

Merged
merged 1 commit into from
Nov 8, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
49 changes: 40 additions & 9 deletions pkg/transform/jq_common.go
Original file line number Diff line number Diff line change
Expand Up @@ -44,10 +44,9 @@ func GojqTransformationFunction(command string, timeoutMs int, spMode bool, jqOu
return nil
}

validTime, ok := a1.(time.Time)

if !ok {
return errors.New("Not a valid time input to 'epoch' function")
validTime, err := parseTime(a1, a2)
if err != nil {
return err
}

return int(validTime.Unix())
Expand All @@ -59,12 +58,10 @@ func GojqTransformationFunction(command string, timeoutMs int, spMode bool, jqOu
return nil
}

validTime, ok := a1.(time.Time)

if !ok {
return errors.New("Not a valid time input to 'epochMillis' function")
validTime, err := parseTime(a1, a2)
if err != nil {
return err
}

return validTime.UnixMilli()
})

Expand All @@ -76,6 +73,40 @@ func GojqTransformationFunction(command string, timeoutMs int, spMode bool, jqOu
return runFunction(code, timeoutMs, spMode, jqOutputHandler), nil
}

func parseTime(input any, params []any) (time.Time, error) {
switch v := input.(type) {
case string:
timeLayout, err := parseTimeLayout(params)
if err != nil {
return time.Time{}, err
}

validTime, err := time.Parse(timeLayout, v)
if err != nil {
return time.Time{}, fmt.Errorf("Could not parse input - '%s' using provided time layout - '%s'", v, timeLayout)
}
return validTime, nil
case time.Time:
return v, nil
default:
return time.Time{}, fmt.Errorf("Not a valid time input to 'epochMillis' function - '%v'; expected string or time.Time", input)
}
}

func parseTimeLayout(params []any) (string, error) {
if len(params) == 0 {
return "2006-01-02T15:04:05.999Z", nil
} else if len(params) == 1 {
str, ok := params[0].(string)
if !ok {
return "", fmt.Errorf("Function argument is invalid '%v'; expected string", params[0])
}
return str, nil
} else {
return "", fmt.Errorf("Too many function arguments - %d; expected 1", len(params))
}
}

func runFunction(jqcode *gojq.Code, timeoutMs int, spMode bool, jqOutputHandler JqOutputHandler) TransformationFunction {
return func(message *models.Message, interState interface{}) (*models.Message, *models.Message, *models.Message, interface{}) {
input, parsedEvent, err := mkJQInput(message, interState, spMode)
Expand Down
65 changes: 63 additions & 2 deletions pkg/transform/jq_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,25 @@ func TestJQRunFunction_SpMode_true(t *testing.T) {
ExpInterState: nil,
Error: nil,
},
{
Scenario: "test_timestamp_to_epochMillis_context",
JQCommand: `{ sessionId: .contexts_com_snowplowanalytics_snowplow_client_session_1[0].firstEventTimestamp | epochMillis }`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if it's a string, but no layout is provided, we simply default to the layout that we encounter in Snowplow data - is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Also default layout in epoch is different than the layout we use for date-time atomic fields:

  • for atomic like collector_tstamp we use yyyy-MM-dd HH:mm:ss.SSS
  • for contexts it's yyyy-MM-dd'T'HH:mm:ss.SSS'Z' (so T + Z), based on tracker code. And based on what I've seen in generated mobile data and firstEventTimestamp field :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - but the atomic fields are provided as time.Time, so in effect it's the same form an API perspective if I understand correctly. The API works as follows:

For the standard Snowplow data, we simply call epoch or epochMillis, and it'll work whether it's the atomic fields or a field in a context.

For other format timestmaps (eg. if there's custom tracking in a different format) we can optionally provide a format as an argument.

I'm quite happy with this!

Perhaps there's a chance that some standard Snowplow timestamps aren't in this format, and in those cases we can provide the format - but if we were to encounter that scenario I would suggest that it's actually an upstream problem - either the trackers or enrich should provide these values in a consistent format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. For any Snowplow field, from JQ transformation point of view, in 99.99% cases you'd only need pure epoch or epochMillis, no custom layout needed. If there is some weird format suddenly coming from somewhere, you can add param to handle it.

InputMsg: &models.Message{
Data: SnowplowTsv1,
PartitionKey: "some-key",
},
InputInterState: nil,
Expected: map[string]*models.Message{
"success": {
Data: []byte(`{"sessionId":1730129267100}`),
PartitionKey: "some-key",
},
"filtered": nil,
"failed": nil,
},
ExpInterState: nil,
Error: nil,
},
{
Scenario: "test_timestamp_to_epoch",
JQCommand: `{ foo: .collector_tstamp | epoch }`,
Expand Down Expand Up @@ -273,6 +292,25 @@ func TestJQRunFunction_SpMode_false(t *testing.T) {
ExpInterState: nil,
Error: nil,
},
{
Scenario: "epochMillis_custom_timelayout",
JQCommand: `{ sessionId: .time | epochMillis("2006-01-02 15:04:05.999")}`,
InputMsg: &models.Message{
Data: []byte(`{"time": "2024-10-28 15:27:47.100"}`),
PartitionKey: "some-key",
},
InputInterState: nil,
Expected: map[string]*models.Message{
"success": {
Data: []byte(`{"sessionId":1730129267100}`),
PartitionKey: "some-key",
},
"filtered": nil,
"failed": nil,
},
ExpInterState: nil,
Error: nil,
},
{
Scenario: "epoch_on_nullable",
JQCommand: `
Expand Down Expand Up @@ -563,7 +601,7 @@ func TestJQRunFunction_errors(t *testing.T) {
},
},
ExpInterState: nil,
Error: errors.New("Not a valid time input to 'epochMillis' function"),
Error: errors.New("Could not parse input - 'value' using provided time layout - '2006-01-02T15:04:05.999Z'"),
},
{
Scenario: "epoch_on_non_time_type",
Expand All @@ -586,7 +624,30 @@ func TestJQRunFunction_errors(t *testing.T) {
},
},
ExpInterState: nil,
Error: errors.New("Not a valid time input to 'epoch' function"),
Error: errors.New("Could not parse input - 'value' using provided time layout - '2006-01-02T15:04:05.999Z'"),
},
{
Scenario: "epochMillis_with_not_matching_timelayout",
JQConfig: &JQMapperConfig{
JQCommand: `{ sessionId: .time | epochMillis("2006-01-02 15:04:05") }`,
RunTimeoutMs: 100,
SpMode: false,
},
InputMsg: &models.Message{
Data: []byte(`{"time": "2024-10-28T15:27:47.100"}`),
PartitionKey: "some-key",
},
InputInterState: nil,
Expected: map[string]*models.Message{
"success": nil,
"filtered": nil,
"failed": {
Data: []byte(`{"time": "2024-10-28T15:27:47.100"}`),
PartitionKey: "some-key",
},
},
ExpInterState: nil,
Error: errors.New("Could not parse input - '2024-10-28T15:27:47.100' using provided time layout - '2006-01-02 15:04:05'"),
},
}

Expand Down
4 changes: 2 additions & 2 deletions pkg/transform/transform_test_variables.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,13 +18,13 @@ import (
)

// SnowplowTsv1 is test data
var SnowplowTsv1 = []byte(`test-data1 pc 2019-05-10 14:40:37.436 2019-05-10 14:40:35.972 2019-05-10 14:40:35.551 unstruct e9234345-f042-46ad-b1aa-424464066a33 py-0.8.2 ssc-0.15.0-googlepubsub beam-enrich-0.2.0-common-0.36.0 user<built-in function input> 18.194.133.57 d26822f5-52cc-4292-8f77-14ef6b7a27e2 {"schema":"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0","data":{"schema":"iglu:com.snowplowanalytics.snowplow/add_to_cart/jsonschema/1-0-0","data":{"sku":"item41","quantity":2,"unitPrice":32.4,"currency":"GBP"}}} python-requests/2.21.0 2019-05-10 14:40:35.000 {"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1","data":[{"schema":"iglu:com.acme/justInts/jsonschema/1-0-0", "data":{"integerField": 0}},{"schema":"iglu:com.acme/justInts/jsonschema/1-0-0", "data":{"integerField": 1}},{"schema":"iglu:com.acme/justInts/jsonschema/1-0-0", "data":{"integerField": 2}},{"schema":"iglu:nl.basjes/yauaa_context/jsonschema/1-0-0","data":{"deviceBrand":"Unknown","deviceName":"Unknown","operatingSystemName":"Unknown","agentVersionMajor":"2","layoutEngineVersionMajor":"??","deviceClass":"Unknown","agentNameVersionMajor":"python-requests 2","operatingSystemClass":"Unknown","layoutEngineName":"Unknown","agentName":"python-requests","agentVersion":"2.21.0","layoutEngineClass":"Unknown","agentNameVersion":"python-requests 2.21.0","operatingSystemVersion":"??","agentClass":"Special","layoutEngineVersion":"??"}}]} 2019-05-10 14:40:35.972 com.snowplowanalytics.snowplow add_to_cart jsonschema 1-0-0 `)
var SnowplowTsv1 = []byte(`test-data1 pc 2019-05-10 14:40:37.436 2019-05-10 14:40:35.972 2019-05-10 14:40:35.551 unstruct e9234345-f042-46ad-b1aa-424464066a33 py-0.8.2 ssc-0.15.0-googlepubsub beam-enrich-0.2.0-common-0.36.0 user<built-in function input> 18.194.133.57 d26822f5-52cc-4292-8f77-14ef6b7a27e2 {"schema":"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0","data":{"schema":"iglu:com.snowplowanalytics.snowplow/add_to_cart/jsonschema/1-0-0","data":{"sku":"item41","quantity":2,"unitPrice":32.4,"currency":"GBP"}}} python-requests/2.21.0 2019-05-10 14:40:35.000 {"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1","data":[{"schema":"iglu:com.acme/justInts/jsonschema/1-0-0", "data":{"integerField": 0}},{"schema":"iglu:com.acme/justInts/jsonschema/1-0-0", "data":{"integerField": 1}},{"schema":"iglu:com.acme/justInts/jsonschema/1-0-0", "data":{"integerField": 2}},{"schema":"iglu:nl.basjes/yauaa_context/jsonschema/1-0-0","data":{"deviceBrand":"Unknown","deviceName":"Unknown","operatingSystemName":"Unknown","agentVersionMajor":"2","layoutEngineVersionMajor":"??","deviceClass":"Unknown","agentNameVersionMajor":"python-requests 2","operatingSystemClass":"Unknown","layoutEngineName":"Unknown","agentName":"python-requests","agentVersion":"2.21.0","layoutEngineClass":"Unknown","agentNameVersion":"python-requests 2.21.0","operatingSystemVersion":"??","agentClass":"Special","layoutEngineVersion":"??"}},{"schema":"iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-0-2","data":{"firstEventTimestamp":"2024-10-28T15:27:47.100Z"}}]} 2019-05-10 14:40:35.972 com.snowplowanalytics.snowplow add_to_cart jsonschema 1-0-0 `)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

firstEventTimestamp":"2024-10-28T15:27:47.100Z" - this is new


// SpTsv1Parsed is test data
var SpTsv1Parsed, _ = analytics.ParseEvent(string(SnowplowTsv1))

// SnowplowJSON1 another test data
var SnowplowJSON1 = []byte(`{"app_id":"test-data1","collector_tstamp":"2019-05-10T14:40:35.972Z","contexts_com_acme_just_ints_1":[{"integerField":0},{"integerField":1},{"integerField":2}],"contexts_nl_basjes_yauaa_context_1":[{"agentClass":"Special","agentName":"python-requests","agentNameVersion":"python-requests 2.21.0","agentNameVersionMajor":"python-requests 2","agentVersion":"2.21.0","agentVersionMajor":"2","deviceBrand":"Unknown","deviceClass":"Unknown","deviceName":"Unknown","layoutEngineClass":"Unknown","layoutEngineName":"Unknown","layoutEngineVersion":"??","layoutEngineVersionMajor":"??","operatingSystemClass":"Unknown","operatingSystemName":"Unknown","operatingSystemVersion":"??"}],"derived_tstamp":"2019-05-10T14:40:35.972Z","dvce_created_tstamp":"2019-05-10T14:40:35.551Z","dvce_sent_tstamp":"2019-05-10T14:40:35Z","etl_tstamp":"2019-05-10T14:40:37.436Z","event":"unstruct","event_format":"jsonschema","event_id":"e9234345-f042-46ad-b1aa-424464066a33","event_name":"add_to_cart","event_vendor":"com.snowplowanalytics.snowplow","event_version":"1-0-0","network_userid":"d26822f5-52cc-4292-8f77-14ef6b7a27e2","platform":"pc","unstruct_event_com_snowplowanalytics_snowplow_add_to_cart_1":{"currency":"GBP","quantity":2,"sku":"item41","unitPrice":32.4},"user_id":"user\u003cbuilt-in function input\u003e","user_ipaddress":"18.194.133.57","useragent":"python-requests/2.21.0","v_collector":"ssc-0.15.0-googlepubsub","v_etl":"beam-enrich-0.2.0-common-0.36.0","v_tracker":"py-0.8.2"}`)
var SnowplowJSON1 = []byte(`{"app_id":"test-data1","collector_tstamp":"2019-05-10T14:40:35.972Z","contexts_com_acme_just_ints_1":[{"integerField":0},{"integerField":1},{"integerField":2}],"contexts_nl_basjes_yauaa_context_1":[{"agentClass":"Special","agentName":"python-requests","agentNameVersion":"python-requests 2.21.0","agentNameVersionMajor":"python-requests 2","agentVersion":"2.21.0","agentVersionMajor":"2","deviceBrand":"Unknown","deviceClass":"Unknown","deviceName":"Unknown","layoutEngineClass":"Unknown","layoutEngineName":"Unknown","layoutEngineVersion":"??","layoutEngineVersionMajor":"??","operatingSystemClass":"Unknown","operatingSystemName":"Unknown","operatingSystemVersion":"??"}],"contexts_com_snowplowanalytics_snowplow_client_session_1":[{"firstEventTimestamp":"2024-10-28T15:27:47.100Z"}],"derived_tstamp":"2019-05-10T14:40:35.972Z","dvce_created_tstamp":"2019-05-10T14:40:35.551Z","dvce_sent_tstamp":"2019-05-10T14:40:35Z","etl_tstamp":"2019-05-10T14:40:37.436Z","event":"unstruct","event_format":"jsonschema","event_id":"e9234345-f042-46ad-b1aa-424464066a33","event_name":"add_to_cart","event_vendor":"com.snowplowanalytics.snowplow","event_version":"1-0-0","network_userid":"d26822f5-52cc-4292-8f77-14ef6b7a27e2","platform":"pc","unstruct_event_com_snowplowanalytics_snowplow_add_to_cart_1":{"currency":"GBP","quantity":2,"sku":"item41","unitPrice":32.4},"user_id":"user\u003cbuilt-in function input\u003e","user_ipaddress":"18.194.133.57","useragent":"python-requests/2.21.0","v_collector":"ssc-0.15.0-googlepubsub","v_etl":"beam-enrich-0.2.0-common-0.36.0","v_tracker":"py-0.8.2"}`)

// SnowplowTsv2 is test data
var SnowplowTsv2 = []byte(`test-data2 pc 2019-05-10 14:40:32.392 2019-05-10 14:40:31.105 2019-05-10 14:40:30.218 transaction_item 5071169f-3050-473f-b03f-9748319b1ef2 py-0.8.2 ssc-0.15.0-googlepubsub beam-enrich-0.2.0-common-0.36.0 user<built-in function input> 18.194.133.57 68220ade-307b-4898-8e25-c4c8ac92f1d7 transaction<built-in function input> item58 35.87 1 python-requests/2.21.0 2019-05-10 14:40:30.000 {"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1","data":[{"schema":"iglu:nl.basjes/yauaa_context/jsonschema/1-0-0","data":{"deviceBrand":"Unknown","deviceName":"Unknown","operatingSystemName":"Unknown","agentVersionMajor":"2","layoutEngineVersionMajor":"??","deviceClass":"Unknown","agentNameVersionMajor":"python-requests 2","operatingSystemClass":"Unknown","layoutEngineName":"Unknown","agentName":"python-requests","agentVersion":"2.21.0","layoutEngineClass":"Unknown","agentNameVersion":"python-requests 2.21.0","operatingSystemVersion":"??","agentClass":"Special","layoutEngineVersion":"??"}}]} 2019-05-10 14:40:31.105 com.snowplowanalytics.snowplow transaction_item jsonschema 1-0-0 `)
Expand Down
Loading