From be420d67d3dc65ecb3390860dbd112a877c3294d Mon Sep 17 00:00:00 2001 From: James Sandford Date: Fri, 5 Jul 2024 14:13:38 +0100 Subject: [PATCH] Add ADR and appnote specifying get_urls label format --- docs/adr/0021-storage-label-format.md | 51 +++++++++++++++++ docs/appnotes/0009-storage-label-format.md | 66 ++++++++++++++++++++++ 2 files changed, 117 insertions(+) create mode 100644 docs/adr/0021-storage-label-format.md create mode 100644 docs/appnotes/0009-storage-label-format.md diff --git a/docs/adr/0021-storage-label-format.md b/docs/adr/0021-storage-label-format.md new file mode 100644 index 0000000..b1c4ccc --- /dev/null +++ b/docs/adr/0021-storage-label-format.md @@ -0,0 +1,51 @@ +--- +status: "proposed" +--- +# Label Conventions for get_urls + +## Context and Problem Statement + +At our previous in-person event, we had a number of requests to put together some guidance on the use of `get_urls` labels on flow segments. +People wanted these to provide information to aid the selection of URLs where multiple are available. +This ADR explores options for where to specify a means to signal this information. + +## Considered Options + +* Specify a schema against the `get_urls` label parameter of Segments within the API specification +* Specify a schema in an application note +* Make no specifications/recommendations + +## Decision Outcome + +Chosen option: "Specify a schema in an application note", because while specifying a consistent approach will provide significant benefit, any approach will need proving in the real world before making a core part of the specification. + +### Implementation + +Application note included in this PR. + +## Pros and Cons of the Options + +### Specify a schema against the `get_urls` label parameter of Segments within the API specification + +* Good, because implementers can rely on the format of the information +* Good, because implementers are provided enough information to make informed decissions on which URL to use +* Bad, because it will be a breaking change to the API +* Bad, because the approach hasn't been validated in real world implementations and may need to be modified in future + +### Specify a schema in an application note + +* Good, because implementers are provided enough information to make informed decissions on which URL to use +* Good, because it is not a change to the API itself +* Good, because it allows the approach to be validated in real world implementations +* Bad, because implementers cannot initially assume others will be using the specified format + +### Make no specifications/recommendations + +* Good, because it is not a change to the API or its use +* Bad, because implementers cannot make informed decissions on which URLs to use beyond matching specific labels + +## More Information + +The need for this ADR was identified at the first CNAP in-person event on the 9th-10th May 2024. + +This approach should be re-visited once it has been validated in real-world implementations to consider moving this functionality into the core specification, and whether that should be done as seperate JSON parameters as opposed to a formatted string. diff --git a/docs/appnotes/0009-storage-label-format.md b/docs/appnotes/0009-storage-label-format.md new file mode 100644 index 0000000..04ee3ce --- /dev/null +++ b/docs/appnotes/0009-storage-label-format.md @@ -0,0 +1,66 @@ +# Storage label format specification + +## Abstract + +Users of TAMS have requested guidance on the use of `get_url` labels on flow segments. +People want these to provide information to aid the selection of URLs where multiple are available. +This Application Note provides a specification for the format of `get_url` labels to address these needs. + +In terms of requirements, we primarily want to enable the choice of get_urls based on properties such as bandwidth, latency, and cost. +Unfortunately, many of these cannot be meaningfully signalled in a direct and universal way. +If for no other reason, they are largely dependant on the intermediate infrastructure between the server and client. +This will vary based on the physical and logical location of the client as much as the server. +We must, therefore, provide enough information that the client can infer these properties themselves. +This may be done by providing information about the location, and the type of storage. + +Unfortunately there is no one shared naming convention for resource locations across cloud providers. +There are shared properties (regions, and availability zones), but these aren't universally available. +Some cloud providers don't provide availability zones. +And a local installation, or one behind a CDN might not even provide regions. +So the signalling of these properties must be optional. +Availability zones may also be semi-randomised in their naming to aid in evenly distributing load across cloud infrastructure. +Zone naming is consistent within cloud accounts, though. +Co-locating within a zone can be beneficial to performance. +But this is only possible if a client knows it is within the same account as the server. +It must, then, be possible for a client to either use or ignore availability zone based on whether it is within the same cloud account. +This gives us provider, region, and availability zone as the important information that can be used to signal the location of storage. + +In addition to location, storage type is important for inferring the properties of the storage. +Unfortunately, it is not possible or practical to directly compare similar types of products. +For example, cloud providers may provide Object Storage. +But they may provide multiple types of Object Storage. +These may have vastly different properties. +And there is often no universal naming conventions to describe these products in a way that is comparable between cloud providers. +So while we need to signal this, we are not able to provide universal naming conventions. + +Finally, a more generic "store name" parameter is useful for human identification of stores, and for distinguishing stores which are otherwise identical. + +In summary. +We must signal storage provider, region, availability zone, storage type, and store name. +We cannot, unfortunately, signal any of these in consistently named and universally comparable ways. +But we can specify a schema which allows a client to consistently decide which pieces of information are important to it. + +## Content + +Given all of the above, this Application Note recommends the following naming convention for the `get_urls` `label` parameter on flow segments: +`..::` + +This can be represented more formally with the following Python-compatible regex: +`^(?P[A-Za-z0-9\-\_]+)(.(?P[A-Za-z0-9\-\_]+)(.(?P[A-Za-z0-9\-\_]+))?)?:(?P[A-Za-z0-9\-\_]+):(?P[A-Za-z0-9\-\_]+)$` + +An example use of this would be: +`example-cloud-provider.eu-west-1.a:example-storage-product:example-store-name` + +An example use of this without an availability zone would be: +`example-cloud-provider.eu-west-1:example-storage-product:example-store-name` + +An example use of this without a region would be: +`example-cloud-provider:example-storage-product:example-store-name` + +The parameters `provider`, `region`, `availabilityZone`, and `storeType` should use the machine readable values as provided by the cloud/storage vendor. +The intention of this approach is to allow consistent values to be used without enumerating common/possible values in TAMS. + +As this is an application note, this usage is a recommendation. +Not required. +This is intentional while the proposal is validated in the real world. +Once the approach has been validated, we will re-visit the posibility of moving this specification into the core API specification.