You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here's a place to discuss the caching roadmap. Hamilton 1.79.0 introduces the core caching feature, but we have a lot more planned.
How this thread works
comment with feedback on listed items or suggest new ideas
ideas will be added to this first comment to catalog everything
this is not a commitment to implementing everything listed.
Roadmap
async support: work with the AsyncDriver
integrate with remote execution: allow to selectively execute nodes on Ray, Modal, Skypilot, Runhouse, etc.
distributed execution and caching: current backends are designed for a machine. Make it work across multiple workers and nodes
in-memory stores: cache metadata and results in-memory for long-lasting Python sections (e.g., notebook, interpreter). Could have an option to "persist" the in-memory store by converting it to another store type.
more backends: support S3, Redis, etc.
cache eviction: automatically manage cache storage (expiration date, max n items, max storage size)
Hamilton UI integration: have a "result catalog" and see which artifact (keyed by data_version) is used in which execution
manual cache storage management: provide user-facing utilities to "delete everything associated with node X", "delete everything not in this DAG", etc.
store partitioning and eager loading (optimization): the backend could partition metadata by code_version. When creating the DAG, only load metadata for matching code_version. Associated results could be loaded in-memory or Redis for lightning fast access. This is relevant for production web services with: a set DAG and a large number of entries
DAG optimization and tuning (tangential to caching): Let's say you have a machine learning pipeline built with Hamilton (or a RAG pipeline). It involves data cleaning, feature engineering, feature selection, model training, evaluation. Build tooling to optimize the DAG as a whole with "dag hyperparameter". Caching can supercharge that and make it very efficient
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Here's a place to discuss the caching roadmap. Hamilton
1.79.0
introduces the core caching feature, but we have a lot more planned.How this thread works
Roadmap
AsyncDriver
data_version
) is used in which executioncode_version
. When creating the DAG, only load metadata for matchingcode_version
. Associated results could be loaded in-memory or Redis for lightning fast access. This is relevant for production web services with: a set DAG and a large number of entriesBeta Was this translation helpful? Give feedback.
All reactions