-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
spfs annotation data support #844
Comments
Either way we will be adding a new object because the current say the objects are designed don't lend themselves to evolving their type and having backwards compatibility. So I see this as a choice between adding a new object type specific to holding data (distinct from a blob+payload) or adding a new incarnation of a platform object ("v2") that will get used instead of the original one. (See also: #674) I'll be happy if we create a new platform object type because we can also fix the digest collision problem at the same time. Here's what the structure might look like when associating some data with a layer:
Finding all the possible data associated with a runtime means walking the layers of the runtime either way. @dcookspi made a good observation that we'd get better data de-duplication from using a new data object type. If we're using a new data object type, the act of adding data to a runtime means we only need to append another "layer" to the runtime config. If we use a new platform type, then it means wrapping the existing layers in one of these new objects. I think I'm talking myself into going the new data object direction as I write this out. We need to spec out what would be in that object (or inside platform_v2). Is it a single pair of (key, value), or is it a Map of an arbitrary number of keys and values? Would each object have a concept of a name/id associated with it, e.g., "spk solve data". What happens if two different data objects contain the same keys? How do the entries in these objects get mapped to a filename? We should definitely give ourselves room to evolve the contents of this object over time. I'm always reaching for protobuf as the hammer for this nail. |
I was thinking about a key and value per new data object, and using the layering of the data objects (digests) in the runtime stack to control which keys override which much like existing layers do. The idea being each But the two use cases I have are 2 external-to-spfs tools that store a single key-value pair each, and the keys don't clash (the values have lots of structure internally, but spfs doesn't need to know). That's all fine if there's only a single key per tool, but might not be suitable for other use cases. Any thoughts about how many of keys and values other tooling or use cases might want to add? |
I like the simplicity of a single value per object. I don't have any other use cases in mind, maybe @rydrman has some input here. I'm not expecting a lot of these in any one environment, but it is hard to predict how this might get used. The overhead could get bad if someone decides they need to start adding 1000's of extra bits of data. |
I've started looking at this. |
I've updated the #815 PR with changes that cover the first task in the description. I've spun the other tasks out into separate tickets. I'm going to update the other PR next. |
#995 is also related to this. |
Following on from the discussion in the 2023-08-16 meeting about spfs support for custom external/extra/meta data, here's a summary and breakdown of the changes:
Have spfs natively support storing and retrieving extra data/external data/metadata (arbitrary key:value pairs), but not directly part of the runtime data structure. Have it saved as an object that can be added to the stack, or in the existing platform objects (whichever is easier based on the current codebase). Have api functions (and command line flags) for storing and retrieving the data.
Whenever one of these data objects, or platform objects with data, is part of the runtime, the data also appears as a file in /spfs, e.g. /spfs/.externaldata/key with the contents being the value. spfs would detect the data and render thes files on the filesystem during setup.
Have a new spfs command to access the data by key or keys, e.g.
spfs extenaldata key
that returns the value for the key. For external tools that want to get data out but don't use rust, or don't want to access the data files directly.Todo (as separate tickets):
spfs run --extra-data or --extra-data-file ...
is used (changed to--external-data
)Existing related PRs (which will change)
The text was updated successfully, but these errors were encountered: