spfs annotation data support #844

dcookspi · 2023-08-17T18:44:50Z

Following on from the discussion in the 2023-08-16 meeting about spfs support for custom external/extra/meta data, here's a summary and breakdown of the changes:

Have spfs natively support storing and retrieving extra data/external data/metadata (arbitrary key:value pairs), but not directly part of the runtime data structure. Have it saved as an object that can be added to the stack, or in the existing platform objects (whichever is easier based on the current codebase). Have api functions (and command line flags) for storing and retrieving the data.

Whenever one of these data objects, or platform objects with data, is part of the runtime, the data also appears as a file in /spfs, e.g. /spfs/.externaldata/key with the contents being the value. spfs would detect the data and render thes files on the filesystem during setup.

Have a new spfs command to access the data by key or keys, e.g. spfs extenaldata key that returns the value for the key. For external tools that want to get data out but don't use rust, or don't want to access the data files directly.

Todo (as separate tickets):

Have a new object or modify the Platform object to store this data, add these to the runtime's stack when the api call or spfs run --extra-data or --extra-data-file ... is used (changed to --external-data)
Have the data appear in the /spfs filesystem as a file or files, e.g. in /spfs/.externaldata/key with value as the contents #878
Have a spfs command to access the external data layers in the current environment #879

Existing related PRs (which will change)

Add extra/external annotation data to spfs runtimes #815 (--external-data interface in the runtime data structure)
Store the solution's 'requested by' and 'repo name' data in an spfs annotation for spk to use #837 (spk storing solve data using that interface)

The text was updated successfully, but these errors were encountered:

jrray · 2023-08-18T21:14:32Z

Have a new object or modify the Platform object

Either way we will be adding a new object because the current say the objects are designed don't lend themselves to evolving their type and having backwards compatibility.

So I see this as a choice between adding a new object type specific to holding data (distinct from a blob+payload) or adding a new incarnation of a platform object ("v2") that will get used instead of the original one. (See also: #674)

I'll be happy if we create a new platform object type because we can also fix the digest collision problem at the same time.

Here's what the structure might look like when associating some data with a layer:

# new platform type
platform_v2 { data, stack = [ layer ] }

# new data object type
platform { stack = [ layer, data_obj ] }

Finding all the possible data associated with a runtime means walking the layers of the runtime either way.

@dcookspi made a good observation that we'd get better data de-duplication from using a new data object type.

If we're using a new data object type, the act of adding data to a runtime means we only need to append another "layer" to the runtime config. If we use a new platform type, then it means wrapping the existing layers in one of these new objects.

I think I'm talking myself into going the new data object direction as I write this out.

We need to spec out what would be in that object (or inside platform_v2). Is it a single pair of (key, value), or is it a Map of an arbitrary number of keys and values? Would each object have a concept of a name/id associated with it, e.g., "spk solve data". What happens if two different data objects contain the same keys? How do the entries in these objects get mapped to a filename?

We should definitely give ourselves room to evolve the contents of this object over time. I'm always reaching for protobuf as the hammer for this nail.

dcookspi · 2023-08-18T21:40:38Z

I was thinking about a key and value per new data object, and using the layering of the data objects (digests) in the runtime stack to control which keys override which much like existing layers do. The idea being each --extra-data key:value arg or api store_data call makes a new data object. I was also thinking the key could be the filename, under some fixed path under /spfs.

But the two use cases I have are 2 external-to-spfs tools that store a single key-value pair each, and the keys don't clash (the values have lots of structure internally, but spfs doesn't need to know). That's all fine if there's only a single key per tool, but might not be suitable for other use cases. Any thoughts about how many of keys and values other tooling or use cases might want to add?

jrray · 2023-08-18T22:18:08Z

I like the simplicity of a single value per object. I don't have any other use cases in mind, maybe @rydrman has some input here.

I'm not expecting a lot of these in any one environment, but it is hard to predict how this might get used. The overhead could get bad if someone decides they need to start adding 1000's of extra bits of data.

dcookspi · 2023-09-08T00:38:05Z

I've started looking at this.

dcookspi · 2023-09-22T00:57:44Z

I've updated the #815 PR with changes that cover the first task in the description. I've spun the other tasks out into separate tickets. I'm going to update the other PR next.

dcookspi · 2024-03-01T18:17:11Z

#995 is also related to this.

dcookspi added this to the Spfs External Data milestone Aug 17, 2023

dcookspi added the SPI-0.39 label Aug 17, 2023

This was referenced Aug 17, 2023

Store the solution's 'requested by' and 'repo name' data in an spfs annotation for spk to use #837

Merged

Add extra/external annotation data to spfs runtimes #815

Merged

dcookspi self-assigned this Sep 8, 2023

dcookspi added the SPI AOI Area of interest for SPI label Sep 8, 2023

dcookspi linked a pull request Sep 22, 2023 that will close this issue

Add extra/external annotation data to spfs runtimes #815

Merged

1 task

dcookspi changed the title ~~spfs external data support~~ spfs annotation data support Mar 1, 2024

dcookspi closed this as completed in #815 Mar 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spfs annotation data support #844

spfs annotation data support #844

dcookspi commented Aug 17, 2023 •

edited

Loading

jrray commented Aug 18, 2023

dcookspi commented Aug 18, 2023 •

edited

Loading

jrray commented Aug 18, 2023

dcookspi commented Sep 8, 2023

dcookspi commented Sep 22, 2023

dcookspi commented Mar 1, 2024

spfs annotation data support #844

spfs annotation data support #844

Comments

dcookspi commented Aug 17, 2023 • edited Loading

jrray commented Aug 18, 2023

dcookspi commented Aug 18, 2023 • edited Loading

jrray commented Aug 18, 2023

dcookspi commented Sep 8, 2023

dcookspi commented Sep 22, 2023

dcookspi commented Mar 1, 2024

dcookspi commented Aug 17, 2023 •

edited

Loading

dcookspi commented Aug 18, 2023 •

edited

Loading