Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spfs annotation data support #844

Closed
1 of 3 tasks
dcookspi opened this issue Aug 17, 2023 · 6 comments · Fixed by #815
Closed
1 of 3 tasks

spfs annotation data support #844

dcookspi opened this issue Aug 17, 2023 · 6 comments · Fixed by #815
Assignees
Labels
SPI AOI Area of interest for SPI SPI-0.39

Comments

@dcookspi
Copy link
Collaborator

dcookspi commented Aug 17, 2023

Following on from the discussion in the 2023-08-16 meeting about spfs support for custom external/extra/meta data, here's a summary and breakdown of the changes:

Have spfs natively support storing and retrieving extra data/external data/metadata (arbitrary key:value pairs), but not directly part of the runtime data structure. Have it saved as an object that can be added to the stack, or in the existing platform objects (whichever is easier based on the current codebase). Have api functions (and command line flags) for storing and retrieving the data.

Whenever one of these data objects, or platform objects with data, is part of the runtime, the data also appears as a file in /spfs, e.g. /spfs/.externaldata/key with the contents being the value. spfs would detect the data and render thes files on the filesystem during setup.

Have a new spfs command to access the data by key or keys, e.g. spfs extenaldata key that returns the value for the key. For external tools that want to get data out but don't use rust, or don't want to access the data files directly.

Todo (as separate tickets):

Existing related PRs (which will change)

@jrray
Copy link
Collaborator

jrray commented Aug 18, 2023

Have a new object or modify the Platform object

Either way we will be adding a new object because the current say the objects are designed don't lend themselves to evolving their type and having backwards compatibility.

So I see this as a choice between adding a new object type specific to holding data (distinct from a blob+payload) or adding a new incarnation of a platform object ("v2") that will get used instead of the original one. (See also: #674)

I'll be happy if we create a new platform object type because we can also fix the digest collision problem at the same time.

Here's what the structure might look like when associating some data with a layer:

# new platform type
platform_v2 { data, stack = [ layer ] }

# new data object type
platform { stack = [ layer, data_obj ] }

Finding all the possible data associated with a runtime means walking the layers of the runtime either way.

@dcookspi made a good observation that we'd get better data de-duplication from using a new data object type.

If we're using a new data object type, the act of adding data to a runtime means we only need to append another "layer" to the runtime config. If we use a new platform type, then it means wrapping the existing layers in one of these new objects.

I think I'm talking myself into going the new data object direction as I write this out.

We need to spec out what would be in that object (or inside platform_v2). Is it a single pair of (key, value), or is it a Map of an arbitrary number of keys and values? Would each object have a concept of a name/id associated with it, e.g., "spk solve data". What happens if two different data objects contain the same keys? How do the entries in these objects get mapped to a filename?

We should definitely give ourselves room to evolve the contents of this object over time. I'm always reaching for protobuf as the hammer for this nail.

@dcookspi
Copy link
Collaborator Author

dcookspi commented Aug 18, 2023

I was thinking about a key and value per new data object, and using the layering of the data objects (digests) in the runtime stack to control which keys override which much like existing layers do. The idea being each --extra-data key:value arg or api store_data call makes a new data object. I was also thinking the key could be the filename, under some fixed path under /spfs.

But the two use cases I have are 2 external-to-spfs tools that store a single key-value pair each, and the keys don't clash (the values have lots of structure internally, but spfs doesn't need to know). That's all fine if there's only a single key per tool, but might not be suitable for other use cases. Any thoughts about how many of keys and values other tooling or use cases might want to add?

@jrray
Copy link
Collaborator

jrray commented Aug 18, 2023

I like the simplicity of a single value per object. I don't have any other use cases in mind, maybe @rydrman has some input here.

I'm not expecting a lot of these in any one environment, but it is hard to predict how this might get used. The overhead could get bad if someone decides they need to start adding 1000's of extra bits of data.

@dcookspi dcookspi self-assigned this Sep 8, 2023
@dcookspi dcookspi added the SPI AOI Area of interest for SPI label Sep 8, 2023
@dcookspi
Copy link
Collaborator Author

dcookspi commented Sep 8, 2023

I've started looking at this.

@dcookspi
Copy link
Collaborator Author

I've updated the #815 PR with changes that cover the first task in the description. I've spun the other tasks out into separate tickets. I'm going to update the other PR next.

@dcookspi dcookspi linked a pull request Sep 22, 2023 that will close this issue
1 task
@dcookspi dcookspi changed the title spfs external data support spfs annotation data support Mar 1, 2024
@dcookspi
Copy link
Collaborator Author

dcookspi commented Mar 1, 2024

#995 is also related to this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
SPI AOI Area of interest for SPI SPI-0.39
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants