Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation Plan: Proxying frontend API requests through the Nuxt server #5265

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

obulat
Copy link
Contributor

@obulat obulat commented Dec 11, 2024

Due date:

2024-01-11

Assigned reviewers

Description

Related to #3473

Current round

This discussion is following the Openverse decision-making process. Information about this process can be found
on the Openverse documentation site.
Requested reviewers or participants will be following this process. If you are being asked to give input on a specific detail, you do not need to familiarise yourself with the process and follow it.

This discussion is currently in the Decision round. The deadline for review of this round is 2024-01-11

Revision details

[2024-01-08] I updated the plan based on the comments. The main changes are:

  1. Proxy All Frontend Traffic: Updated the plan to proxy all frontend routes, not just the /api routes. This will make the rate-limiting setup easier and eliminate the need to integrate modules like Nuxt Turnstile. Handling Cloudflare challenges directly in the Nuxt server avoids the complexity of requiring the Pinia media store to process challenge responses instead of result responses.
  2. Use h3 session for identifying users: The plan now uses h3 sessions to create a verifiable session header/cookie. Cloudflare will validate that the header/cookie exists, allowing requests to pass even if the cookie is invalid (e.g., generated by automated requests). The Nuxt server will then decode the cookie using the secret and verify its validity. Invalid cookies will trigger a 401 Unauthorized response from the Nuxt server, preventing these requests from reaching the Django API. This ensures robust protection against automated abuse.
  3. Shared IP scenario testing: Detailed a method to test shared IP scenarios using two different browsers or an incognito window to simulate multiple users. This approach demonstrates how authenticated users bypass stricter limits while anonymous users face challenges if limits are exceeded.
  4. Excluding static asset routes: Explicitly stated that static asset requests (e.g., /_nuxt/, .css, .jpg) are excluded from rate limiting to streamline traffic and reduce unnecessary challenges.

Here's a branch with a draft of changes implementing this project (single commit for all changes): https://github.com/WordPress/openverse/tree/add/api-proxying

@obulat obulat requested a review from a team as a code owner December 11, 2024 09:09
@obulat obulat requested review from krysal and dhruvkb and removed request for a team December 11, 2024 09:09
@obulat obulat added 🟧 priority: high Stalls work on the project or its dependents 🧭 project: implementation plan An implementation plan for a project labels Dec 11, 2024
@openverse-bot openverse-bot added 🧱 stack: documentation Related to Sphinx documentation 🏷 status: label work required Needs proper labelling before it can be worked on labels Dec 11, 2024
@obulat obulat added 🌟 goal: addition Addition of new feature 📄 aspect: text Concerns the textual material in the repository and removed 🏷 status: label work required Needs proper labelling before it can be worked on 🧱 stack: documentation Related to Sphinx documentation labels Dec 11, 2024
@obulat obulat self-assigned this Dec 11, 2024
@obulat obulat added 🧱 stack: mgmt Related to repo management and automations 🧱 stack: documentation Related to Sphinx documentation and removed 🧱 stack: mgmt Related to repo management and automations labels Dec 11, 2024
Copy link

Full-stack documentation: https://docs.openverse.org/_preview/5265

Please note that GitHub pages takes a little time to deploy newly pushed code, if the links above don't work or you see old versions, wait 5 minutes and try again.

You can check the GitHub pages deployment action list to see the current status of the deployments.

New files ➕:

Copy link
Collaborator

@sarayourfriend sarayourfriend left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments suggesting clarifications or alternatives. Happy to chat more about this. Excited to see this project come to light, I think it will massively improve the user experience for a major category of Openverse users (in education) as well as make the project's ingress more sustainable and manageable for the maintainers. Kudos on the IP!

Comment on lines 179 to 181
- From the
[k6 implementation](https://github.com/WordPress/openverse/pull/4908), copy
the helper function that signs the request with HMAC.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is HMAC signing of the requests required? What is the goal with it?

For context on my question, the HMAC approach was designed to bypass Cloudflare's rate limiting for the frontend, without the Nuxt server needing to know anything about the authorization/limits. It isn't necessary to use that method for the API because Cloudflare should be configured to bypass rate limiting for any API requests that contain an Authorization header (as the API itself will handle those rate limits).

In other words, I would think the request flow could go like this, and not require any HMAC at all for regular browser users of the Openverse website:

  • > User requests Openverse page and receives a response with a unique session ID in set-cookie
  • > User executes a client-side search -> Cloudflare checks rate limit for the IP + session ID: challenge if needed; otherwise ...
  • > Request sent to the new Nuxt /api/... proxy routes
  • > The request is forwarded by Nuxt to the API with the API authentication method the Nuxt server already uses
  • > Cloudflare does not rate limit because an Authorization header is present on the request
  • > The API does not rate limit because the Authorization header has a rate-limiting exempt token
  • > Response is sent back to Nuxt
  • > Nuxt removes any response information that could expose Authorization information
  • > Sends response body back to the client

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrote a long reply first, and then realized that my thoughts were valid if only the /api/ routes are behind the rate limiting. If all frontend routes are behind the rate limit rules, we don't need to do anything special for protecting the /api/ routes (that can send unlimited requests to the API) from automatic abuse, as they would be behind the frontend rate limiting rules.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarayourfriend, your bullet points helped me picture this and made me realize that the cookie is probably an important part I didn't get. Thank you for summarizing it.

authentication, HMAC)
- send the request using `ofetch` (or `ofetch.raw` to extract the response
headers for `SEARCH_RESPONSE_TIME` Plausible event)
- handle errors and return the appropriate response to the client
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the route also reject requests as Unauthorized (401) if they are missing the session ID cookie? The API proxy routes should only be used for client-side requests of search.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea. I had to write the draft of the implementation to see how it would work.

  • in the beginning of the server setup (e.g., in the init-stores.server plugin, if it's renamed init.server), set the sessionId cookie that is httpOnly (if it does not exist).

  • send the request from the store to the /api route.

    • On the client, the httpOnly will automatically be sent.
    • On the server, the request will be sent between the different parts of the Nuxt app, without the browser passing the cookies. We need to specifically pass the cookies using getProxyRequestHeaders from h3 to extract them and pass them on with the $fetch request. Since we are on the server, the httpOnly cookies are also available.

envisage that we will need to adjust the limits many times because testing the
Cloudflare rate limiting locally is not possible.

### Use server routes in the frontend when the feature flag is on
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that leveraging the @openverse/api-client package would allow some simplification here. If the Nuxt API proxy routes are designed to match the same path-specification as the API itself, then the same API client can be used by swapping out the base URL based on the feature flag and whether the app is running server side or client side. To the rest of the application code, there would be no difference in how the requests are made.

It's probably more work up-front, because existing code would need to be changed to use the @openverse/api-client, so worth considering whether it's something to plan for in the future. But even without using the @openverse/api-client package, the same strategy of swapping the base URL could be used with the api-service and avoid needing multiple different functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the Nuxt-native methods would be better for logging and error handling of the requests on the server.
During the transition period, there would need to be separate branches for requesting the proxied/non-proxied urls. However, it would be more convenient to handle errors within the server route, where we can log all requests and errors on the server.

We could use the same naming for the server routes as the API has (so, instead of /api/search/image/?q=cat have /v1/images/?q=cat), which would make the conversion of the URLs simpler.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough! For what it's worth, it's possible to swap the underlying fetch implementation to pass options through to Nuxt's version, if it has better logging and other options. Check out the fetch option to createClient in the docs.

The main benefit of using @openverse/api-client is unifying the API and types, and it provides a convenient way to swap out the target URL as needed. It would be possible to re-write the paths dynamically in the function passed to fetch, so that the Nuxt routes don't need to reproduce exactly the same paths as the upstream API.

Anyway, just an alternative idea in case it is useful. The plan you've got doesn't look bad to me at all 🙂

Comment on lines +339 to +394
- **Always Issuing a Challenge Without Thresholds**: Issuing a managed challenge
to every new user, regardless of load or conditions, would inconvenience users
behind NATs and degrade the initial user experience. While simpler to
implement (we would not need to calculate the exact rate limit), it fails our
design goal of minimizing friction for normal users.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Issuing a managed to every new user, regardless of load or conditions

Can you explain further why every new user would be issued a challenge? For what it's worth, the managed challenge "should" most of the time be a JavaScript challenge, which requires no user interaction.

To reduce friction without needing to introduce the turnstile module and involving the application code, you could have a separate (or no) rate limit for the home page, relying on that being the most common entry point for users where they would receive the original cookie to carry for the rest of their session.

To clarify the situation behind a NAT, it might be overly cautious to think that the rate limit will be tripped so quickly. Imagine a large school with maybe 3 computer labs having a simultaneous class, 50 students each. That is 150 fully unauthenticated requests from a single IP that need to happen for everyone to get a cookie. If the rate limit of a cookie-less request is set to 150 requests / 30 minute period, and it just takes a managed challenge to clear it for the IP, I don't think this would introduce very much friction at all.

For users on low spec Android phone, they would probably avoid ever seeing the managed challenge and get the cookie, and then just be subject to the regular rate limit.

of going directly to the API, requests will go to the Nuxt server routes
(`/api/**`). The main difference is that, when rate-limited, the client will
receive a Cloudflare challenge response that will be handled by
[Nuxt Turnstile Module](https://github.com/nuxt-modules/turnstile).
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like turnstile is mostly geared towards websites that want to use Cloudflare challenges without proxying requests through Cloudflare itself. At least, that's what the Turnstile marketing page indicates as the primary target use case. Openverse proxies requests through Cloudflare already. Can you clarify the goal behind using Turnstile rather than regular Cloudflare managed challenges, exactly?

Comment on lines +107 to +134
- **Multi-language Support**: Challenges support multiple languages [^2],
although the number of supported languages is lower than the number of
languages supported by Openverse.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, it's possible to implement custom versions of Cloudflare's error and challenge pages: https://developers.cloudflare.com/support/more-dashboard-apps/cloudflare-custom-pages/configuring-custom-pages-error-and-challenge/

I'd recommend doing this for the rate-limit response, for example, so that users get an Openverse branded page explaining that they are making requests too quickly, and advising them how long they need to wait until they can make another request, or how to get in contact with the maintainers if they believe there is an error.

This should eliminate language-support issues for the error responses, ensure that the presentation language is consistent across the session (e.g., using Openverse's own methods for identifying which language to present the page in, which may differ from Cloudflare's methods), and ensure a cohesive presentation overall. I don't know if there is a way to configure the language of the challenge itself, but maybe Cloudflare will make a best-guess based on the lang-attribute on the custom page? In any case, it would be a better experience than the entire page being inconsistent with the rest of the website.

Comment on lines +87 to +106
Originally, I thought of using Cloudflare managed challenges to protect the Nuxt
server routes from abuse. However, this would have required issuing a challenge
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A clarification here: "to protect the Nuxt server routes from abuse".

For the solution to work, it's necessary for all routes to be behind the same Cloudflare rate limit rules, not just the server routes. That way, if someone gets rate limited on the server routes, the frontend can reload the page and the user will actually get the managed challenge page for clearing the rate limiting (or the explanation page if they are going too fast and need to actually be limited).

If only the server routes are limited by the new configuration, then there's no way to present the user with a challenge that will clear that rate limit, and the user would be stuck.

I mention in another comment that I don't think this would require issuing a challenge to every new user and would not meaningfully increase friction in the vast majority of even extreme cases. Regardless, it is necessary for all routes to have the same rate limit rules for this Cloudflare-rate-limit strategy to work with the safe guard of users being able to clear the rate limit as needed via the challenge.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading through your comments and thinking more about this, I agree that we should have all routes behind the same Cloudflare rate limit rules.

That way, if someone gets rate limited on the server routes, the frontend can reload the page and the user will actually get the managed challenge page for clearing the rate limiting (or the explanation page if they are going too fast and need to actually be limited).

In the original implementation, the turnstile module was supposed to handle the cases when the users get the managed challenge instead of the results from the server routes. The module would temporarily show the "turnstile" page (this is not for the Cloudflare turnstile product, but the page that handles the HTML response with the challenge from the proxy and then saves the cookie to clear the challenge), and then allow to show the target page.
Here are the docs on Cloudflare integration of Turnstile with WAF rules: https://blog.cloudflare.com/integrating-turnstile-with-the-cloudflare-waf-to-challenge-fetch-requests/ It gives the JS snippet to handle the challenge response from the API.

However, it would be easier to simply use the rate limiting for all frontend paths.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha! I was not connecting the dots on the turnstile handling the flow of presenting the challenge for the subset of rate limited routes.

Copy link
Member

@krysal krysal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing this detailed plan, @obulat. It's a nice explanation of the steps to incorporate Cloudflare-provided rate-limiting for the frontend and better control the use of Openverse. It sounded quite challenging at first, but I now have a clearer idea of what it entails after reading it. I only have a couple of loose points, namely:

  • The cookie that will help identify each user
  • How will be the local usage of Openverse after these changes are applied

Comment on lines +27 to +29
This implementation plan outlines how we will route all frontend traffic to the
API through Nuxt server routes — effectively “authenticating” these requests at
the API level — and configure Cloudflare rate-limiting to protect these routes
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would requests from the same user also be identified at the API level? I understand that everything will be seen as "coming from the frontend." For example, is the cookie value passed to the API, too?

Comment on lines 179 to 181
- From the
[k6 implementation](https://github.com/WordPress/openverse/pull/4908), copy
the helper function that signs the request with HMAC.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sarayourfriend, your bullet points helped me picture this and made me realize that the cookie is probably an important part I didn't get. Thank you for summarizing it.

Copy link
Member

@dhruvkb dhruvkb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have much to add beyond what @sarayourfriend and @krysal have already covered. So 👍 from me to move into the next round of incorporating their feedback.

The turnstile module was a confusing aspect of the plan for me but it seems from existing feedback that a second version of the IP will not be needing it. (I will still check out the module, in case it still ends up being used for the plan).

Comment on lines +345 to +399
- **Proxying thumbnail requests** The thumbnail requests from the frontend will
remain anonymous on the API level. Since controlling search and related
requests naturally throttles excessive thumbnail retrieval, we consider it
unnecessary to further load the Nuxt server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this also similarly applicable for audio waveforms?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I forgot about the waveforms 🤦

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder the same as Dhruv.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that we request the waveforms when we add the peaks=true param to the audio requests. They are added to each audio object in the peaks array. So, there is no separate request.

@obulat obulat force-pushed the add/proxy-ip branch 2 times, most recently from c463217 to 985bfdc Compare January 8, 2025 04:29
Comment on lines +197 to +199
(http.request.headers["x-h3-session"][0] ne "")
or
(http.request.headers["Cookie"] contains "ovSession")
Copy link
Member

@krysal krysal Jan 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are or would there be two different session cookies? If so, why?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will be only one session cookie. Did I use different names throughout the IP? That was a mistake, if so. I'll fix it.
However, there should be both the header and the cookie, because h3 useSession sets both, and the priority is given to the header.

Comment on lines +225 to +229
2. User 1: In one browser, simulate a user with a valid session cookie,
confirming they are treated as authenticated and bypass stricter rate limits.
3. User 2: In the other browser, simulate an anonymous user without a session
cookie. Verify that their requests are subject to stricter IP-based rate
limits.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little lost here. How are we supposed to simulate each user? If I understand correctly, by simply using the frontend app the session cookie should be added after the first response, once the Cloudflare challenge is resolved correctly, so I guess the User 2 could be the tricky case, or should it simply one happen after another? If that is the case, then it is brilliant and fabulous that the test can be so simple!

Copy link
Contributor Author

@obulat obulat Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, simply using the frontend app should add the session cookie. So, what we can test here is the fact that even if the same IP sent more requests than allowed by the rate limit, the next user with the same IP can see the challenge and use Openverse after the challenge is passed.

Here's how it can be done:

  1. Set the staging rate limit to something like 2 requests per 10 minutes
  2. Open several tabs in incognito mode, and visit the staging app. After the first two tabs, you should see the Cloudflare challenge. It should be easy to pass the challenge.

To test that it's not easy to pass the rate-limiting, we can send 3 requests to staging. The 3rd request should return the Cloudflare challenge, and not pass through the Cloudflare proxy.

Comment on lines +241 to +245
##### Real rate limits

Calculate the actual rate limits based on the expected traffic. The rate limits
should be generous enough to accommodate normal user behavior. Later, they can
be made stricter to deter automated abuse.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was exactly wondering about this. Sounds fair, it can be defined or refined during/after the implementation 👌


- `/api/search/[type]` - for search requests
- `/api/[type]/[id]` - for single result requests
- `/api/[type]/related/[id]` - for related media requests
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To keep consistency with API routes.

Suggested change
- `/api/[type]/related/[id]` - for related media requests
- `/api/[type]/[id]/related/` - for related media requests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might make it more difficult to handle. With the proposed structure, the single and related pages would be /api/[type]/[id].ts and /api/[type]/related/[id].ts. If we move the [id] in the related route, the single result route might also catch the related requests because it could match on /api/[type]/[id].ts. Can we keep this as is, and change it when implementing, if possible, @krysal ?

Comment on lines +345 to +399
- **Proxying thumbnail requests** The thumbnail requests from the frontend will
remain anonymous on the API level. Since controlling search and related
requests naturally throttles excessive thumbnail retrieval, we consider it
unnecessary to further load the Nuxt server.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder the same as Dhruv.

@krysal
Copy link
Member

krysal commented Jan 8, 2025

I'm glad the proposal has been simplified by removing the Nuxt Turnstile module requirement 😄 I have a small question about the number of sessions, other than that, there are minor comments/suggestions. Once clarified, I expect a quick approval. Thanks for writing this, @obulat. It looks very promising.

@openverse-bot
Copy link
Collaborator

Based on the high urgency of this PR, the following reviewers are being gently reminded to review this PR:

@sarayourfriend
@dhruvkb
This reminder is being automatically generated due to the urgency configuration.

Excluding weekend1 days, this PR was ready for review 21 day(s) ago. PRs labelled with high urgency are expected to be reviewed within 2 weekday(s)2.

@obulat, if this PR is not ready for a review, please draft it to prevent reviewers from getting further unnecessary pings.

Footnotes

  1. Specifically, Saturday and Sunday.

  2. For the purpose of these reminders we treat Monday - Friday as weekdays. Please note that the operation that generates these reminders runs at midnight UTC on Monday - Friday. This means that depending on your timezone, you may be pinged outside of the expected range.

obulat and others added 6 commits January 10, 2025 21:09
…0241202-implementation_plan_proxy_frontend_api_requests_ip.md

Co-authored-by: Krystle Salazar <krystle.salazar@automattic.com>
…0241202-implementation_plan_proxy_frontend_api_requests_ip.md

Co-authored-by: Krystle Salazar <krystle.salazar@automattic.com>
…0241202-implementation_plan_proxy_frontend_api_requests_ip.md

Co-authored-by: Krystle Salazar <krystle.salazar@automattic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
📄 aspect: text Concerns the textual material in the repository 🌟 goal: addition Addition of new feature 🟧 priority: high Stalls work on the project or its dependents 🧭 project: implementation plan An implementation plan for a project 🧱 stack: documentation Related to Sphinx documentation
Projects
Status: Pending proposal
Status: 👀 Needs Review
Development

Successfully merging this pull request may close these issues.

5 participants