Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SEARCH_RESPONSE_TIME analytics event #2471

Closed
sarayourfriend opened this issue Jun 21, 2023 · 2 comments · Fixed by #4044
Closed

Add SEARCH_RESPONSE_TIME analytics event #2471

sarayourfriend opened this issue Jun 21, 2023 · 2 comments · Fixed by #4044
Assignees
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: frontend Related to the Nuxt frontend ⌨️ tech: typescript Involves TypeScript 🔧 tech: vue Involves Vue.js

Comments

@sarayourfriend
Copy link
Collaborator

Problem

While we know how long our server takes to respond to a request, we don't know how long it takes for clients to receive the response. It would be nice if we could implement this in our analytics code by adding an event SEARCH_RESPONSE_TIME to run client-side only that sends the time it took for the response. It should only do so for queries that aren't cached in the local browser cache.

To help visualise the part we don't have information on, here is a diagram:

client -> Cloudflare -> ELB -> nginx -> django -> nginx -> ELB -> Cloudflare -> client

The only timing we have visibility into is Cloudflare -> ... -> Cloudflare and any self-to-self timing in between (nginx to nginx, ELB to ELB). We never know the client -> ... -> client timings with our current measurement tools. Client timings have two manifestations as well: Cloudflare cache hits and misses. Therefore, we also need to record whether the Cloudflare cache header records a hit or miss so we can disambiguate and see which part is taking the longest (is it Cloudflare's response or everything after Cloudflare?).

This is motivated by a discovery that many search requests in our frontend can take longer than 1 second to respond for me locally, but this isn't apparent from the metrics we have for the nodes after Cloudflare in the diagram above.

Description

Refer to the frontend analytics guide for more information on how to use and set up analytics for development.

When making a search request, create a date object before sending the request. If the Date header on the response is after the request start date, then the response is not cached in the local client. This is because the Date header does not change if the response is retrieved from local cache (which we don't care about the timings for). It does change, however, on any outbound request, even ones that hit the Cloudflare cache.

When the request comes back, if the Date header is after the time we started making the request, then get the elapsed time between the request start and when the response was received. Also pull out the Cloudflare cache status header cf-cache-status. Send the following analytics payload:

type Events = {
  // ...

  /**
   * Time client-side search responses. Gives us observability into
   * real user experience of search timings.
   *
   * Payload:
   *   - `cfCacheStatus`: Whether the request hit Cloudflare or went all the way to our servers
   *   - `cfRayIATA`: The IATA location identifier at the end of the `cf-ray` header. Indicates the data centre the request passed through. This gives us an idea of approximate distance from our API servers without revealing more precise request location information.
   *   - `elapsedSeconds`: How many seconds it took to receive a response for the request
   *   - `queryString`: The full query string including additional filters (i.e., not just the search term)
   */
  SEARCH_RESPONSE_TIMING: {
    cfCacheStatus: "HIT" | "MISS"
    cfRayIATA: string
    elapsedTime: number
    queryString: string
  }
}

Alternatives

We could implement a bona fide RUM library that measures this stuff more generally, including client-side render timings, etc. But those can often cost a tremendous amount of money and I think we can get the information we need for this using Plausible. We might even be able to implement deeper analysis like client side render timings in Plausible as well to get RUM-level data about site performance, but we're a ways from being able to use that information anyway, I think.

@sarayourfriend sarayourfriend added help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon 🌟 goal: addition Addition of new feature 🔧 tech: vue Involves Vue.js ⌨️ tech: typescript Involves TypeScript 🧱 stack: frontend Related to the Nuxt frontend labels Jun 21, 2023
@github-project-automation github-project-automation bot moved this to 📋 Backlog in Openverse Backlog Jun 21, 2023
@sarayourfriend sarayourfriend added the 💻 aspect: code Concerns the software code in the repository label Jun 21, 2023
@adjeiv
Copy link
Contributor

adjeiv commented Dec 18, 2023

I'd like to have a crack at this.
From my understanding, we'll want to handle this mostly in the JavaScript/TypeScript side of things, i.e.

  • In the media service search function, construct the Date object to keep track of the current time
  • On receipt of the AxiosResponse within this search function, extract the Date header. If it's before the time of request, ignore. Else, populate the custom event with data inferred from the cf-ray and cf-cache status headers, as well as the Date header and the param query.

Seems like we might want this in both the search function and the getMediaDetail function, perhaps.

Let me know whether I'm thinking in the right direction!

@sarayourfriend
Copy link
Collaborator Author

That's correct! Except getMediaDetail isn't a search query, per se. For now let's just do this in search, as you said, and in related. If you want to also include getMediaDetail, then we should change the name of the event to API_RESPONSE_TIME. In all cases we'll need to include a slug to identify the type of request (search, related, media detail, etc).

@openverse-bot openverse-bot moved this from 📋 Backlog to 🏗 In Progress in Openverse Backlog Apr 8, 2024
@openverse-bot openverse-bot moved this from 🏗 In Progress to ✅ Done in Openverse Backlog Apr 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💻 aspect: code Concerns the software code in the repository 🌟 goal: addition Addition of new feature help wanted Open to participation from the community 🟨 priority: medium Not blocking but should be addressed soon 🧱 stack: frontend Related to the Nuxt frontend ⌨️ tech: typescript Involves TypeScript 🔧 tech: vue Involves Vue.js
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

3 participants