Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tag namespace concept #715

Merged
merged 16 commits into from
Mar 6, 2024
Merged

Add tag namespace concept #715

merged 16 commits into from
Mar 6, 2024

Conversation

jrray
Copy link
Collaborator

@jrray jrray commented May 9, 2023

From #390, the desire is to have different users (or CI jobs on the same
runner) have a certain amount of isolation from each other. In particular,
spk ls is expected to not contain any "local" packages that don't
belong to the current user/CI job.

This commit experiments with this idea. Only the local repo is expected to
have an optional tag namespace set, so tar and rpc repos have not been
changed. Setting the tag namespace is possible via the spfs config file,
but it is expected to typically be set via the environment variable
$SPFS__STORAGE__TAG_NAMESPACE instead.

It may be reasonable to add a [spk] config option to make the tag
namespace default to the current user's username. A user sharing a host
with other users are unlikely to want to see any local packages made by
other users.

There is an open question on if spfs runtimes should be subject to the tag
namespace. This code currently does make them subject to the tag namespace.
Therefore, if a user has a tag namespace specified, spfs runtime ls would
only show the runtimes created in that same tag namespace, and those
runtimes would be hidden from other users/other tag namespaces. This is in
the same spirit of users being isolated from each other, however an
argument could be made that it is important from an admin perspective to
be able to list all the runtimes on a host.

spfs clean is a special case that always ignores any tag namespace. It
should be able to find all tags in the repo and avoid coming to any false
conclusions about what objects are garbage on the basis of only seeing
tags inside a tag namespace.

Signed-off-by: J Robert Ray jrray@jrray.org

@jrray jrray self-assigned this May 9, 2023
@jrray
Copy link
Collaborator Author

jrray commented May 9, 2023

Something else I haven't tried to tackle here yet is to avoid collisions between namespaced and non-namespaced tags. Since using a namespace simply adds a path prefix to the tag's location on disk, it is easy enough to have collisions, but it would be a good idea that tag "foo/bar" and tag "bar" in namespace "foo" are distinct entities that can coexist.

This could be accomplished by putting a fixed prefix like ".namespaces/" at the beginning of all tag namespaces on disk, and then disallowing that as a legal name of a non-namespaced name. So like:

  • spfs/tags/foo/bar.tag
  • spfs/tags/.namespaces/foo/bar.tag

There's still the possibility of collisions between namespaces that are prefixes of other namespaces, though. We can require namespaces be a single flat name (no /'s).

@codecov
Copy link

codecov bot commented May 9, 2023

Codecov Report

Attention: Patch coverage is 48.61111% with 37 lines in your changes are missing coverage. Please review.

Project coverage is 55.97%. Comparing base (e8cf549) to head (17cf840).
Report is 344 commits behind head on main.

❗ Current head 17cf840 differs from pull request most recent head b9a33a1. Consider uploading reports for the commit b9a33a1 to get more accurate results

Files Patch % Lines
crates/spk-storage/src/storage/spfs.rs 0.00% 5 Missing ⚠️
crates/spfs/src/storage/fallback/repository.rs 0.00% 4 Missing ⚠️
crates/spfs/src/storage/mod.rs 42.85% 4 Missing ⚠️
crates/spfs/src/storage/proxy/repository.rs 0.00% 4 Missing ⚠️
crates/spfs/src/storage/rpc/tag.rs 0.00% 4 Missing ⚠️
crates/spfs/src/storage/tag.rs 0.00% 4 Missing ⚠️
crates/spfs/src/storage/tar/repository.rs 0.00% 4 Missing ⚠️
crates/spfs-cli/cmd-clean/src/cmd_clean.rs 0.00% 2 Missing ⚠️
crates/spfs/src/proto/conversions.rs 0.00% 2 Missing ⚠️
crates/spfs/src/storage/fs/repository.rs 86.66% 2 Missing ⚠️
... and 1 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #715      +/-   ##
==========================================
+ Coverage   53.55%   55.97%   +2.42%     
==========================================
  Files         258      234      -24     
  Lines       20466    18318    -2148     
==========================================
- Hits        10960    10254     -706     
+ Misses       9506     8064    -1442     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jrray
Copy link
Collaborator Author

jrray commented May 9, 2023

If we go in the direction of not putting runtimes into the tag namespace, where I ran into a small snag was here:

https://github.com/imageworks/spk/blob/d365525aa21deb347ab638b8c81882f19879c5db/crates/spfs/src/runtime/storage.rs#L470-L474

The problem being that this code will want to clear the tag namespace on whatever repo it gets called with, but the signature here means this can only get an Arc<Repository::Handle> and RepositoryHandle doesn't implement Clone. But from what I can tell none of the callers of this call it with an Arc so this particular signature isn't entrenched.

@rydrman
Copy link
Collaborator

rydrman commented May 10, 2023

From the meeting today:

@jrray jrray force-pushed the spfs-tag-namespace branch 3 times, most recently from 4aba1aa to ab41fe4 Compare May 16, 2023 00:11
@jrray jrray requested a review from rydrman May 16, 2023 01:09
@jrray
Copy link
Collaborator Author

jrray commented May 25, 2023

@rydrman I removed the depth part from the namespace.

Copy link
Collaborator

@rydrman rydrman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it would be a lot of work for other repository types to support tag namespaces? Most have an FS repo at the backend, anyway or something like RPC can just send it in the request and default to None.

I foresee us trying to use this feature in the future and immediately hitting that issues (since we use almost no direct FS repos)

crates/spfs/src/config.rs Outdated Show resolved Hide resolved
Comment on lines 473 to 475
// Keep runtime tags out of the tag namespace. Allow `spfs runtime`
// to view and operate on all runtimes on the host.
inner.as_tag_mut().set_tag_namespace(None);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a good argument to be made for putting all runtime tags into a specific namespace so that they don't busy-up the main one for users (though it wouldn't be backwards-compatible)

crates/spfs/src/storage/fs/repository.rs Outdated Show resolved Hide resolved
crates/spfs/src/storage/fs/tag.rs Outdated Show resolved Hide resolved
crates/spfs/src/storage/tag.rs Outdated Show resolved Hide resolved
@jrray
Copy link
Collaborator Author

jrray commented Dec 8, 2023

I rebased this but haven't started addressing the notes yet. But I want to point out how this feature didn't really integrate with the pinned repository feature, since a pinned repository is a wrapper for another repo (via Arc) it wasn't possible to implement the new TagStorageMut trait for it. Instead I made some methods fallible and fail on that kind of repo. This led to having to make Storage::new fallible. But conceptually it should be possible to set the tag namespace on a pinned repository.

The first obvious thing I tried was to use Arc::make_mut but RepositoryHandle is not Clone.

Edit: Also to clarify, it is possible to set a tag namespace on a repo and then pin it, but currently not the other way around.

@jrray jrray force-pushed the spfs-tag-namespace branch 2 times, most recently from ba0705f to 76b0e56 Compare December 11, 2023 22:59
@jrray
Copy link
Collaborator Author

jrray commented Dec 12, 2023

Do you think it would be a lot of work for other repository types to support tag namespaces? Most have an FS repo at the backend, anyway or something like RPC can just send it in the request and default to None.

I foresee us trying to use this feature in the future and immediately hitting that issues (since we use almost no direct FS repos)

I implemented RPC in my latest commit. Have a look at the commit message for more details.

@rydrman
Copy link
Collaborator

rydrman commented Dec 13, 2023

Could we make the tag storage modification non require &mut and need to be internally mutable (via an atomic of some kind)? Or was there a strong reason why you wanted it that way?

This layers on my previous thoughts about replacing the handle enum with Box<dyn Repository> which, at first thought, would resolve this pinning relationship...

For now, I think it's probably okay as a trade-off

Copy link
Collaborator

@rydrman rydrman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me, I think the only part that feels like it's missing is the ability to provide a namespace in url form for remote repos. I'm picturing a url query param like I added for the pinning stuff in open_repository...

@rydrman
Copy link
Collaborator

rydrman commented Feb 14, 2024

From the meeting today:

  • see the last note above, we could split this out and follow up if it feels like more than a simple thing

@jrray
Copy link
Collaborator Author

jrray commented Feb 16, 2024

This looks good to me, I think the only part that feels like it's missing is the ability to provide a namespace in url form for remote repos. I'm picturing a url query param like I added for the pinning stuff in open_repository...

I added the RemoteConfig stuff.

From #390, the desire is to have different users (or CI jobs on the same
runner) have a certain amount of isolation from each other. In particular,
`spk ls` is expected to not contain any "local" packages that don't
belong to the current user/CI job.

This commit experiments with this idea. Only the local repo is expected to
have an optional tag namespace set, so tar and rpc repos have not been
changed. Setting the tag namespace is possible via the spfs config file,
but it is expected to typically be set via the environment variable
$SPFS_STORAGE_TAGNAMESPACE instead.

It may be reasonable to add a [spk] config option to make the tag
namespace default to the current user's username. A user sharing a host
with other users are unlikely to want to see any local packages made by
other users.

There is an open question on if spfs runtimes should be subject to the tag
namespace. This code currently does make them subject to the tag namespace.
Therefore, if a user has a tag namespace specified, `spfs runtime ls` would
only show the runtimes created in that same tag namespace, and those
runtimes would be hidden from other users/other tag namespaces. This is in
the same spirit of users being isolated from each other, however an
argument could be made that it is important from an admin perspective to
be able to list _all_ the runtimes on a host.

`spfs clean` is a special case that always ignores any tag namespace. It
should be able to find _all_ tags in the repo and avoid coming to any false
conclusions about what objects are garbage on the basis of only seeing
tags inside a tag namespace.

Signed-off-by: J Robert Ray <jrray@jrray.org>
Storage::new will start needing a mutable reference to the repo in order
to make it ignore the current tag namespace.

Signed-off-by: J Robert Ray <jrray@jrray.org>
Signed-off-by: J Robert Ray <jrray@jrray.org>
Use a character that is an illegal character for users, '#', so we can
prevent users from creating tag names that could be misinterpreted as
namespace names.

To prevent different namespaces from getting mixed up, include the depth
of the namespace component in the directory name. For example the tag "hi"
in namespace "foo/bar" would produce this on disk:

    /spfs-repos/local/tags/foo#ns.0
    /spfs-repos/local/tags/foo#ns.0/bar#ns.1
    /spfs-repos/local/tags/foo#ns.0/bar#ns.1/hi.tag

This prevents tag "bar/hi" in namespace "foo" from colliding with tag "hi"
in namespace "foo/bar" since these would have different filenames:

- tags/foo#ns.0/bar/hi.tag
- tags/foo#ns.0/bar#ns.1/hi.tag

Signed-off-by: J Robert Ray <jrray@jrray.org>
jrray added 12 commits March 6, 2024 10:21
Give ls-tags some awareness of the existence of namespace entries so it can
hide these from its output.

Note that iter_tag_streams doesn't make use of `EntryType` and will walk
into any namespace directories it find, which for the time being is good
because it still allows `spfs clean` to find all tags in all namespaces.

Signed-off-by: J Robert Ray <jrray@jrray.org>
Verify that TagSpec will reject a name in the new namespace naming
convention (it already does).

Signed-off-by: J Robert Ray <jrray@jrray.org>
The test code requires a way to modify the tag namespace of a TempRepo,
but since it can be changed through an Arc, add a `with_tag_namespace`
method to make a duplicate handle to an existing repo but with a different
namespace set. This required putting the TempDir inside an Arc.

Signed-off-by: J Robert Ray <jrray@jrray.org>
Confirm that tags with names that could possibly collide with tag namespace
names are really distinct and don't collide.

Signed-off-by: J Robert Ray <jrray@jrray.org>
This was deemed unnecessary for disambiguation purposes and would only be
useful for supporting nesting namespaces, which we do not expect to support
at this time.

Signed-off-by: J Robert Ray <jrray@jrray.org>
Signed-off-by: J Robert Ray <jrray@jrray.org>
This requires adding "*_in_namespace" flavors of the tag methods,
because the rpc server needed a way to process requests for tags in a
namespace without introducing race conditions by setting the tag
namespace on its repo handle. The server is "stateless" in the sense
that setting the tag namespace on the client side only changes state on
the client side, and then any requests sent to the server include the
name of the prevailing tag namespace as part of the request.

Signed-off-by: J Robert Ray <jrray@jrray.org>
Signed-off-by: J Robert Ray <jrray@jrray.org>
Leverage the fact that Config contains the tag address.

Signed-off-by: J Robert Ray <jrray@jrray.org>
As part of transitioning to use RelativePath instead of Path, create
wrapper types to partially hide the underlying type from the public API.

Signed-off-by: J Robert Ray <jrray@jrray.org>
Make it easier to add more query options.

Signed-off-by: J Robert Ray <jrray@jrray.org>
Signed-off-by: J Robert Ray <jrray@jrray.org>
@jrray jrray merged commit f271c2b into main Mar 6, 2024
7 checks passed
@jrray jrray deleted the spfs-tag-namespace branch March 6, 2024 20:50
@jrray jrray mentioned this pull request Mar 7, 2024
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants