Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Configurable time layout in epoch/epochMillis JQ functions #378

Merged
merged 1 commit into from
Nov 8, 2024

Conversation

pondzix
Copy link
Contributor

@pondzix pondzix commented Nov 8, 2024

ref: PDP-1539

Before this commit epoch-like functions required time.Time type on input. It worked well for atomic Snowplow fields like collector_tstamp.

It becomes problematic when some nested context fields representing time are passed. Such fields don't use time.Time, they are plain strings.

This commit makes epoch/epochMillis functions more flexible:

  • When input is time.Time - just use it as is (atomic fields).
  • When input is a string - try to parse it as time.Time.
  • When input is something different - return an error.

In case of a string there are various time layouts that could be used for parsing. That's why epoch/epochMillis also accept (optional!) additional string parameter representing time layout. Layout must be valid go layout. Default value is 2006-01-02T15:04:05.999Z.

@@ -18,7 +18,7 @@ import (
)

// SnowplowTsv1 is test data
var SnowplowTsv1 = []byte(`test-data1 pc 2019-05-10 14:40:37.436 2019-05-10 14:40:35.972 2019-05-10 14:40:35.551 unstruct e9234345-f042-46ad-b1aa-424464066a33 py-0.8.2 ssc-0.15.0-googlepubsub beam-enrich-0.2.0-common-0.36.0 user<built-in function input> 18.194.133.57 d26822f5-52cc-4292-8f77-14ef6b7a27e2 {"schema":"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0","data":{"schema":"iglu:com.snowplowanalytics.snowplow/add_to_cart/jsonschema/1-0-0","data":{"sku":"item41","quantity":2,"unitPrice":32.4,"currency":"GBP"}}} python-requests/2.21.0 2019-05-10 14:40:35.000 {"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1","data":[{"schema":"iglu:com.acme/justInts/jsonschema/1-0-0", "data":{"integerField": 0}},{"schema":"iglu:com.acme/justInts/jsonschema/1-0-0", "data":{"integerField": 1}},{"schema":"iglu:com.acme/justInts/jsonschema/1-0-0", "data":{"integerField": 2}},{"schema":"iglu:nl.basjes/yauaa_context/jsonschema/1-0-0","data":{"deviceBrand":"Unknown","deviceName":"Unknown","operatingSystemName":"Unknown","agentVersionMajor":"2","layoutEngineVersionMajor":"??","deviceClass":"Unknown","agentNameVersionMajor":"python-requests 2","operatingSystemClass":"Unknown","layoutEngineName":"Unknown","agentName":"python-requests","agentVersion":"2.21.0","layoutEngineClass":"Unknown","agentNameVersion":"python-requests 2.21.0","operatingSystemVersion":"??","agentClass":"Special","layoutEngineVersion":"??"}}]} 2019-05-10 14:40:35.972 com.snowplowanalytics.snowplow add_to_cart jsonschema 1-0-0 `)
var SnowplowTsv1 = []byte(`test-data1 pc 2019-05-10 14:40:37.436 2019-05-10 14:40:35.972 2019-05-10 14:40:35.551 unstruct e9234345-f042-46ad-b1aa-424464066a33 py-0.8.2 ssc-0.15.0-googlepubsub beam-enrich-0.2.0-common-0.36.0 user<built-in function input> 18.194.133.57 d26822f5-52cc-4292-8f77-14ef6b7a27e2 {"schema":"iglu:com.snowplowanalytics.snowplow/unstruct_event/jsonschema/1-0-0","data":{"schema":"iglu:com.snowplowanalytics.snowplow/add_to_cart/jsonschema/1-0-0","data":{"sku":"item41","quantity":2,"unitPrice":32.4,"currency":"GBP"}}} python-requests/2.21.0 2019-05-10 14:40:35.000 {"schema":"iglu:com.snowplowanalytics.snowplow/contexts/jsonschema/1-0-1","data":[{"schema":"iglu:com.acme/justInts/jsonschema/1-0-0", "data":{"integerField": 0}},{"schema":"iglu:com.acme/justInts/jsonschema/1-0-0", "data":{"integerField": 1}},{"schema":"iglu:com.acme/justInts/jsonschema/1-0-0", "data":{"integerField": 2}},{"schema":"iglu:nl.basjes/yauaa_context/jsonschema/1-0-0","data":{"deviceBrand":"Unknown","deviceName":"Unknown","operatingSystemName":"Unknown","agentVersionMajor":"2","layoutEngineVersionMajor":"??","deviceClass":"Unknown","agentNameVersionMajor":"python-requests 2","operatingSystemClass":"Unknown","layoutEngineName":"Unknown","agentName":"python-requests","agentVersion":"2.21.0","layoutEngineClass":"Unknown","agentNameVersion":"python-requests 2.21.0","operatingSystemVersion":"??","agentClass":"Special","layoutEngineVersion":"??"}},{"schema":"iglu:com.snowplowanalytics.snowplow/client_session/jsonschema/1-0-2","data":{"firstEventTimestamp":"2024-10-28T15:27:47.100Z"}}]} 2019-05-10 14:40:35.972 com.snowplowanalytics.snowplow add_to_cart jsonschema 1-0-0 `)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

firstEventTimestamp":"2024-10-28T15:27:47.100Z" - this is new

@@ -52,6 +52,25 @@ func TestJQRunFunction_SpMode_true(t *testing.T) {
ExpInterState: nil,
Error: nil,
},
{
Scenario: "test_timestamp_to_epochMillis_context",
JQCommand: `{ sessionId: .contexts_com_snowplowanalytics_snowplow_client_session_1[0].firstEventTimestamp | epochMillis }`,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if it's a string, but no layout is provided, we simply default to the layout that we encounter in Snowplow data - is that correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Also default layout in epoch is different than the layout we use for date-time atomic fields:

  • for atomic like collector_tstamp we use yyyy-MM-dd HH:mm:ss.SSS
  • for contexts it's yyyy-MM-dd'T'HH:mm:ss.SSS'Z' (so T + Z), based on tracker code. And based on what I've seen in generated mobile data and firstEventTimestamp field :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right - but the atomic fields are provided as time.Time, so in effect it's the same form an API perspective if I understand correctly. The API works as follows:

For the standard Snowplow data, we simply call epoch or epochMillis, and it'll work whether it's the atomic fields or a field in a context.

For other format timestmaps (eg. if there's custom tracking in a different format) we can optionally provide a format as an argument.

I'm quite happy with this!

Perhaps there's a chance that some standard Snowplow timestamps aren't in this format, and in those cases we can provide the format - but if we were to encounter that scenario I would suggest that it's actually an upstream problem - either the trackers or enrich should provide these values in a consistent format.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. For any Snowplow field, from JQ transformation point of view, in 99.99% cases you'd only need pure epoch or epochMillis, no custom layout needed. If there is some weird format suddenly coming from somewhere, you can add param to handle it.

Before this commit `epoch`-like functions required `time.Time` type on input.
It worked well for atomic Snowplow fields like `collector_tstamp`.

It becomes problematic when some nested context fields representing time
are passed. Such fields don't use `time.Time`, they are plain strings.

This commit makes `epoch`/`epochMillis` functions more flexible:

* When input is `time.Time` - just use it as is (atomic fields).
* When input is a string - try to parse it as `time.Time`.
* When input is something different - return an error.

In case of a string there are various time layouts that could be used
for parsing. That's why `epoch`/`epochMillis` also accept additional
 string parameter representing time layout. Layout must be valid [GO
 layout](https://pkg.go.dev/time#pkg-constants). Default value is `2006-01-02T15:04:05.999Z`.
@pondzix pondzix force-pushed the more_flexible_epoch_millis branch from 3e4f51e to 9302e89 Compare November 8, 2024 13:55
Copy link
Collaborator

@colmsnowplow colmsnowplow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM great stuff Piotr thank you!

@pondzix pondzix merged commit 9302e89 into develop Nov 8, 2024
1 check passed
@pondzix pondzix deleted the more_flexible_epoch_millis branch November 8, 2024 14:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants