Skip to content

Commit

Permalink
Add ADR and appnote specifying get_urls label format
Browse files Browse the repository at this point in the history
  • Loading branch information
j616 committed Jul 9, 2024
1 parent 9c29fc4 commit be420d6
Show file tree
Hide file tree
Showing 2 changed files with 117 additions and 0 deletions.
51 changes: 51 additions & 0 deletions docs/adr/0021-storage-label-format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
---
status: "proposed"
---
# Label Conventions for get_urls

## Context and Problem Statement

At our previous in-person event, we had a number of requests to put together some guidance on the use of `get_urls` labels on flow segments.
People wanted these to provide information to aid the selection of URLs where multiple are available.
This ADR explores options for where to specify a means to signal this information.

## Considered Options

* Specify a schema against the `get_urls` label parameter of Segments within the API specification
* Specify a schema in an application note
* Make no specifications/recommendations

## Decision Outcome

Chosen option: "Specify a schema in an application note", because while specifying a consistent approach will provide significant benefit, any approach will need proving in the real world before making a core part of the specification.

### Implementation

Application note included in this PR.

## Pros and Cons of the Options

### Specify a schema against the `get_urls` label parameter of Segments within the API specification

* Good, because implementers can rely on the format of the information
* Good, because implementers are provided enough information to make informed decissions on which URL to use
* Bad, because it will be a breaking change to the API
* Bad, because the approach hasn't been validated in real world implementations and may need to be modified in future

### Specify a schema in an application note

* Good, because implementers are provided enough information to make informed decissions on which URL to use
* Good, because it is not a change to the API itself
* Good, because it allows the approach to be validated in real world implementations
* Bad, because implementers cannot initially assume others will be using the specified format

### Make no specifications/recommendations

* Good, because it is not a change to the API or its use
* Bad, because implementers cannot make informed decissions on which URLs to use beyond matching specific labels

## More Information

The need for this ADR was identified at the first CNAP in-person event on the 9th-10th May 2024.

This approach should be re-visited once it has been validated in real-world implementations to consider moving this functionality into the core specification, and whether that should be done as seperate JSON parameters as opposed to a formatted string.
66 changes: 66 additions & 0 deletions docs/appnotes/0009-storage-label-format.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Storage label format specification

## Abstract

Users of TAMS have requested guidance on the use of `get_url` labels on flow segments.
People want these to provide information to aid the selection of URLs where multiple are available.
This Application Note provides a specification for the format of `get_url` labels to address these needs.

In terms of requirements, we primarily want to enable the choice of get_urls based on properties such as bandwidth, latency, and cost.
Unfortunately, many of these cannot be meaningfully signalled in a direct and universal way.
If for no other reason, they are largely dependant on the intermediate infrastructure between the server and client.
This will vary based on the physical and logical location of the client as much as the server.
We must, therefore, provide enough information that the client can infer these properties themselves.
This may be done by providing information about the location, and the type of storage.

Unfortunately there is no one shared naming convention for resource locations across cloud providers.
There are shared properties (regions, and availability zones), but these aren't universally available.
Some cloud providers don't provide availability zones.
And a local installation, or one behind a CDN might not even provide regions.
So the signalling of these properties must be optional.
Availability zones may also be semi-randomised in their naming to aid in evenly distributing load across cloud infrastructure.
Zone naming is consistent within cloud accounts, though.
Co-locating within a zone can be beneficial to performance.
But this is only possible if a client knows it is within the same account as the server.
It must, then, be possible for a client to either use or ignore availability zone based on whether it is within the same cloud account.
This gives us provider, region, and availability zone as the important information that can be used to signal the location of storage.

In addition to location, storage type is important for inferring the properties of the storage.
Unfortunately, it is not possible or practical to directly compare similar types of products.
For example, cloud providers may provide Object Storage.
But they may provide multiple types of Object Storage.
These may have vastly different properties.
And there is often no universal naming conventions to describe these products in a way that is comparable between cloud providers.
So while we need to signal this, we are not able to provide universal naming conventions.

Finally, a more generic "store name" parameter is useful for human identification of stores, and for distinguishing stores which are otherwise identical.

In summary.
We must signal storage provider, region, availability zone, storage type, and store name.
We cannot, unfortunately, signal any of these in consistently named and universally comparable ways.
But we can specify a schema which allows a client to consistently decide which pieces of information are important to it.

## Content

Given all of the above, this Application Note recommends the following naming convention for the `get_urls` `label` parameter on flow segments:
`<provider>.<region[optional]>.<availabilityZone[optional]>:<storeType>:<storeName>`

This can be represented more formally with the following Python-compatible regex:
`^(?P<provider>[A-Za-z0-9\-\_]+)(.(?P<region>[A-Za-z0-9\-\_]+)(.(?P<availabilityZone>[A-Za-z0-9\-\_]+))?)?:(?P<storeType>[A-Za-z0-9\-\_]+):(?P<storeName>[A-Za-z0-9\-\_]+)$`

An example use of this would be:
`example-cloud-provider.eu-west-1.a:example-storage-product:example-store-name`

An example use of this without an availability zone would be:
`example-cloud-provider.eu-west-1:example-storage-product:example-store-name`

An example use of this without a region would be:
`example-cloud-provider:example-storage-product:example-store-name`

The parameters `provider`, `region`, `availabilityZone`, and `storeType` should use the machine readable values as provided by the cloud/storage vendor.
The intention of this approach is to allow consistent values to be used without enumerating common/possible values in TAMS.

As this is an application note, this usage is a recommendation.
Not required.
This is intentional while the proposal is validated in the real world.
Once the approach has been validated, we will re-visit the posibility of moving this specification into the core API specification.

0 comments on commit be420d6

Please sign in to comment.