Skip to content
This repository was archived by the owner on Sep 18, 2025. It is now read-only.

Conversation

@MarquiseRosier
Copy link
Collaborator

@MarquiseRosier MarquiseRosier commented Jan 5, 2024

A minimal set of data endpoints necessary to run https://data.aem.live/rum-dashboard

Also I updated rum-pageviews to use Events_V4; because the version that uses Events_V3 cannot resolve www.petplace.com to petplace.com

When we use Events_V4 which prepends www and checks for results for a url, www.petplace.com and petplace.com can appear.

@trieloff
@langswei
@dbrrt

@github-actions
Copy link

github-actions bot commented Jan 5, 2024

This PR will trigger a minor release when merged.

@codecov
Copy link

codecov bot commented Jan 5, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Comparison is base (387b11e) 100.00% compared to head (c39dc76) 100.00%.

Additional details and impacted files
@@            Coverage Diff            @@
##              main     #1024   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files            6         6           
  Lines          710       710           
=========================================
  Hits           710       710           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@langswei
Copy link
Collaborator

langswei commented Jan 5, 2024

I believe that not every PR merged into main automatically results in v3 semantic versioning. A conventional commit message such as feat: updated dashboard query must be included as at least one of the commits. Lars can confirm if my understanding is correct.

@langswei
Copy link
Collaborator

langswei commented Jan 5, 2024

Please run SQLFluff.

@langswei
Copy link
Collaborator

langswei commented Jan 5, 2024

Given the move to _V4 table functions in some queries, I suggest creating a file similar to /src/queries/common-event-v3.sql where the new table functions are documented.

@MarquiseRosier
Copy link
Collaborator Author

@langswei I don't think I'll add a query to record the new table function as it wouldn't really fit into this PR, feel free to document the new table functions if you'd like in another PR though!

Events_V4 was created by @trieloff sometime ago; some queries simply haven't been upgraded to use it.

@langswei
Copy link
Collaborator

langswei commented Jan 7, 2024

@langswei I don't think I'll add a query to record the new table function as it wouldn't really fit into this PR, feel free to document the new table functions if you'd like in another PR though!

Events_V4 was created by @trieloff sometime ago; some queries simply haven't been upgraded to use it.

Sounds good, separate PR makes sense, no reason to combine here. My overarching goal is to ensure queries used by data.aem.live and queries used to feed spacecat and eventually DaaS -- especially those related to page view estimates -- are based on common logic so that we maintain consistency. I agree with you this is a separate topic.

source,
COUNT(id) AS ids,
COUNT(DISTINCT url) AS pages,
APPROX_TOP_COUNT(url, 1)[OFFSET(0)].value AS topurl,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see a slack thread started by @shsteimer in #aem-data-desk on Jan 3. I believe that grouping by source and taking the top url is hiding data. Might make more sense to group by source+url combination, or group by url and take the top source (referer).

Copy link
Contributor

@trieloff trieloff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't add _V4 in any of the built-in queries, make sure sqlfluff runs cleanly.

ELSE TIMESTAMP_TRUNC(time, DAY)
END AS date
FROM helix_rum.PAGEVIEWS_V3(
FROM helix_rum.PAGEVIEWS_V4(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not create PAGEVIEWS_V4 and I object strongly to using it in any of the default queries as it is hiding data.

MAX(weight) AS pageviews
FROM
`helix-225321.helix_rum.EVENTS_V4`(
net.host(@url), @offset, @interval, '-', '-', 'UTC', 'all', @domainkey
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • @url should have the same semantics as the rest of the @url parameters, so URL prefix without https://, optional path and $ or ? delimiters.
  • if you list startdate and enddate in the supported parameters, you should also pass them through here
  • same for timezone
  • same for device

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, the same data should come from PAGEVIEWS_V4 – why does this query exist when rum-pageviews is there already?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants