Caching roadmap #1167

zilto · 2024-10-04T16:40:53Z

zilto
Oct 4, 2024
Collaborator

Here's a place to discuss the caching roadmap. Hamilton 1.79.0 introduces the core caching feature, but we have a lot more planned.

How this thread works

comment with feedback on listed items or suggest new ideas
ideas will be added to this first comment to catalog everything
this is not a commitment to implementing everything listed.

Roadmap

async support: work with the AsyncDriver
integrate with remote execution: allow to selectively execute nodes on Ray, Modal, Skypilot, Runhouse, etc.
distributed execution and caching: current backends are designed for a machine. Make it work across multiple workers and nodes
in-memory stores: cache metadata and results in-memory for long-lasting Python sections (e.g., notebook, interpreter). Could have an option to "persist" the in-memory store by converting it to another store type.
more backends: support S3, Redis, etc.
cache eviction: automatically manage cache storage (expiration date, max n items, max storage size)
Hamilton UI integration: have a "result catalog" and see which artifact (keyed by data_version) is used in which execution
manual cache storage management: provide user-facing utilities to "delete everything associated with node X", "delete everything not in this DAG", etc.
store partitioning and eager loading (optimization): the backend could partition metadata by code_version. When creating the DAG, only load metadata for matching code_version. Associated results could be loaded in-memory or Redis for lightning fast access. This is relevant for production web services with: a set DAG and a large number of entries
DAG optimization and tuning (tangential to caching): Let's say you have a machine learning pipeline built with Hamilton (or a RAG pipeline). It involves data cleaning, feature engineering, feature selection, model training, evaluation. Build tooling to optimize the DAG as a whole with "dag hyperparameter". Caching can supercharge that and make it very efficient

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Caching roadmap #1167

{{title}}

Replies: 0 comments

Select a reply

Caching roadmap #1167

zilto Oct 4, 2024 Collaborator

How this thread works

Roadmap

Replies: 0 comments

zilto
Oct 4, 2024
Collaborator