Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incremental static rendering (ISR) with bazel build #53

Open
dgp1130 opened this issue Feb 4, 2023 · 0 comments
Open

Incremental static rendering (ISR) with bazel build #53

dgp1130 opened this issue Feb 4, 2023 · 0 comments
Labels
feature New feature or request

Comments

@dgp1130
Copy link
Owner

dgp1130 commented Feb 4, 2023

Terrible idea: Support an incremental static rendering (ISR) server model which dynamically rebuilds certain routes on demand by invoking bazel build of a rules_prerender workspace in production. I hold no reservations that this is anything but an awful idea, but I just want to write down the thought at least.

Consider:

  • An HTTP server which connects to a Bazel client to request it to build certain targets and propagate the returned files back to users.
    • Ex: GET /about triggers bazel build //my_blog/about and then returns dist/bin/my_blog/about/index.html.
  • User request could be implemented by writing a JSON file to a temporary directory and then using --package_path to put it in the build where it can be depended upon.
    • For example, depend on //user/request to get the injected user request object and conditionalize the build output.
    • Need to be careful not to cascade rebuilds and avoid running compilers in production. //user/request should only ever be a data dependency, so tools are rerun, but nothing needs to recompile.
    • Definitely easy to accidentally get this wrong and get very bad performance.
  • For rules_prerender, we would need to rethink the bundling story, since different requests might generate different includeScript() or inlineStyle() calls which would cause a rebundle of client-side resources. This probably necessitates some amount of prebundling (slow, but not dependent on user request) combined with runtime bundling (fast, depended on user request) to minimize the client resources.
  • Each build uses a different --output_path so they can run in parallel and serve multiple users.
  • Caching
    • Incremental caching means resources are implicitly cached until they are invalidated.
    • Sharing the cache with CI means that if CI passes before pushing to prod, then those resources are already cached and most won't trigger a real rebuild.
    • Likely need a very small cache in the HTTP server to avoid duplicating bazel build calls for subsequent requests from the same target such as /index.html, /index.js, /index.css, ...
  • RBE
    • The Bazel client runs all builds in RBE, probably with a special .bazelrc to push as much work off the device as possible to avoid overloading it and managing multiple users.
    • RBE probably doesn't have strict SLOs? Running at request time in the critical path is probably not great performance.
    • How expensive would RBE be in comparison to traditional SSR costs? My intuition is that this would be more expensive than a more traditional SSR setup.
    • Co-locating the HTTP server and RBE datacenters would likely be very important for timing, given that the backend RPC network will be quite congested running each action.
    • Bazel can use up a lot of memory just tracking builds, even when all the real work is done on RBE, how many concurrent clients could this HTTP server support? How well would that scale?
  • Databases
    • The whole point of ISR is to serve data which updates faster than code pushes happen, but Bazel workspaces limit dependencies to those authored in the workspace. For this to be useful, you'd need to depend on a backend database of some kind which updates frequently, and that dependency is tricky.
    • Could depend on an "external" workspace which just pulls data from a database and makes it available in a file such as a SQLite database. Each build would get an updated copy in real time, even if code pushes happen infrequently.
    • Need to be careful to limit dependencies on the database though, or else it would invalidate huge amounts of the build graph every time it updated. Again, it should only ever be used as a data dependency for user.
    • Beyond that, having so many reverse dependencies on a single file would likely cause a lot of work each build. Ideally, only affected pages should need to be rebuilt. If you have customers, orders, and products tables, and a new customer signs up, you shouldn't have to rebuild all your product pages.
    • Each of those tables would need to be represented as a different file, whether because they originated from a different database, or because they come from a single database which gets split into multiple (db.sqlite -> [ customers.sqlite, orders.sqlite, products.sqlite ]).
    • As soon as you split a database, indexes can't cross them, meaning you can't do: SELECT * FROM orders WHERE orders.id = customer.id; I think the choice of database infrastructure would be really key to making dependencies work here.
    • You'd probably need to throttle database changes so it doesn't invalidate the Bazel graph every 2 seconds. That could make situations like "submit an order and then refresh the page" yield awkwardly out of date results.
    • Also Bazel thinks in files, not services. So even if we represent databases as SQLite files, every actions which runs them needs to spin up a SQLite server which parses all or at least part of the database file before serving requests. Ideally, we could have a "file worker" running as part of the Bazel build, so all actions which query a specific SQLite file go to the same server which is already initialized for the file.
    • I have no idea how such a system could work and it would almost certainly require significant changes to Bazel to be performant.
  • Backends
    • Interoperating with other backend systems via RPCs would be really tricky. In theory we could build backend dependencies as macros which get invoked on generated files and dynamically pull in those dependencies as external workspaces operating on data dependencies of request files.
    • This gets very awkward though and Bazel's hermeticity guarantees directly counter the idea of applying side effects like updating a database as part of the request.
    • Most likely we'd need some kind of schema for a build to return "mutations" which the HTTP server propagates into state changes for the system, but this starts to look really ugly.
    • The ideal use case is probably treating it as rendering infrastructure, and pushing all mutations into a separate API server used by the client after rendering.

Anyways, this is a terrible idea and it definitely should never be done. That said, might be a fun experiment? I would be interested to see which of this constraints breaks the architecture first. 🤣

@dgp1130 dgp1130 added the feature New feature or request label Feb 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant