Skip to content

Conversation

@saehejkang
Copy link
Contributor

@saehejkang saehejkang commented Nov 22, 2025

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation update

Motivation and Context

Closes #893

Testing

  • Tested locally
  • Added/updated tests
  • Added/updated docs

@saehejkang saehejkang force-pushed the add-network-prune-command branch from aa9d622 to fd0e2af Compare November 22, 2025 18:44
suhasramanand added a commit to suhasramanand/container that referenced this pull request Nov 24, 2025
…check

- Implement server-side prune() in NetworksService with atomic operations
- Only prune networks in .running state (preserves non-running networks)
- Add comprehensive test coverage (5 tests covering all edge cases)
- Follow existing patterns (similar to VolumesService.prune())
- Add networkPrune XPC route and client method

This merges the architectural improvements from PR apple#914 with the
correct logic from PR apple#906, addressing jlogan's request to combine
the best of both implementations.
@jglogan
Copy link
Contributor

jglogan commented Dec 2, 2025

@saehejkang @suhasramanand

Is doing prune in the server needless complexity for network prune?

I'm asking this as a design question, not as an explicit criticism of the approach.

Our principal concern for these prune operations is consistency – if we run some arbitrary collection of operations on a set of related resources, we don't want to leave the system in an inconsistent state. Principally I think that's dangling references; as an example, after a prune we should never have a case where a container refers to a non-existent network.

The network delete operation is already consistent on its own – a naive but functional implementation of prune would just to try deleting each network in turn regardless of what refers to it. Ideally there'd be a distinct EBUSY-like error so one could differentiate those (which could be silently ignored) versus other conditions.

I think the first, client-side-only, PR precomputed the set of unreferenced networks up front which would reduce the number of trips to the server that would return EBUSY – a simple and nice optimization.

Is there something that a server-side implementation of prune can do that can't be done by combining existing operations in ContainerClient? The guiding principle here is to keep the server side as slender and simple as possible, so that it's easier to reliably maintain.

It's probably worthwhile looking at what we need/want to do for other resources. For the image prune stuff there was a prune() operation for imagerefs all the way down in containerization, but we extracted that part out to container based on the same principle – the ImageStore only needs to ensure consistent pruning of content blobs, while pruning imagerefs can be dealt with in container.

@suhasramanand
Copy link

suhasramanand commented Dec 2, 2025

@jglogan I agree that a client-side approach would align better with the existing design pattern.

Looking at ImagePrune (in Sources/ContainerCommands/Image/ImagePrune.swift), it:

  1. Lists images and containers on the client
  2. Filters to determine which images to delete
  3. Calls ClientImage.delete() for each one
  4. Only uses server-side for cleanupOrphanedBlobs() (blob cleanup, not imageref pruning)

A client-side network prune would follow the same pattern:

  • ClientNetwork.list() to get all networks
  • ClientContainer.list() to determine which networks are in use
  • Filter on the client to identify unreferenced networks
  • Call ClientNetwork.delete() for each one, handling EBUSY-like errors gracefully

This keeps the server minimal and consistent with how image prune works. The network delete operation already ensures consistency, so the client-side approach should be sufficient.

The main benefit of server-side would be atomicity via withContainerList, but if network delete already handles consistency correctly, that may not be necessary.

@saehejkang
Copy link
Contributor Author

saehejkang commented Dec 3, 2025

Our principal concern for these prune operations is consistency

This may have been where the confusion first occurred, as a single design pattern does not seem to be used consistently. I took reference from the volume prune command, which is more of a server-side approach. Furthermore, the image prune command is mainly a client-side approach.

Is there something that a server-side implementation of prune can do that can't be done by combining existing operations in ContainerClient?

There is nothing on the server-side implementation that can't be done by combining operations in the ContainerClient.

It's probably worthwhile looking at what we need/want to do for other resources.

Below is a list of the prune commands and the current/future implementation approach.

  • image prune - client-side approach
  • volume prune - server-side approach
  • container prune - server-side approach (in progress Implement container prune #904)
  • network prune - approach dependent on this discussion

Is there any reason why any of these resources would EVER need to use a server-side approach (besides image prune with the content blobs)?


Regarding the network prune command, we have two PRs, each proposing a valid approach. A decision on which one to merge is needed, and I will defer that to the maintainers. Finally, if we decide on the client-side approach, it wouldn’t be fair for me to simply make the changes here and then merge my PR.

@jglogan
Copy link
Contributor

jglogan commented Dec 3, 2025

single design pattern does not seem to be used consistently

This. I've blocked out some time today to look at this (and the error handling bit I mentioned) from a broader perspective, and then will follow up here and we can discuss how to move forward.

@jglogan
Copy link
Contributor

jglogan commented Dec 4, 2025

OK, here are my thoughts after a little reflection:

  • Let's keep the server APIs minimal, where we do the basic resource management operations reliably and consistently (focusing on getting contending operations on a single resource correct).
  • The client is where we'll do composite operations, and operations on collections. The client shall be responsible for dealing with partial success - we should consistently report errors where operations could be performed on some resources and not others (making exceptions for things like prune where an operation may not be able to be performed because of a state change race).

The principal disadvantage that I see right now is performance. Our APIs operate on a single resource at once today, each one locking the container collection. This isn't really a new issue though: container volume rm --all would incur the same penalties.

@saehejkang and @suhasramanand What do you think?

@saehejkang
Copy link
Contributor Author

Let's keep the server APIs minimal, where we do the basic resource management operations

This makes sense to me. Just to be clear, this is for get, create, delete. operations for a single resource?

The client is where we'll do composite operations, and operations on collections

This also makes sense to me. Again, to be clear, this is for operations like network prune, as it would be going through the collection of containers/networks, and deleting unused networks?


  1. Are we revisiting any commands (refer to my note here) and making the proper updates?
  2. Question here

Our APIs operate on a single resource at once today, each one locking the container collection.

  1. Is this going to also be revisited in the future? Are we ever going to build APIs that operate on more than a single resource?

@jglogan
Copy link
Contributor

jglogan commented Dec 4, 2025

  1. Are we revisiting any commands (refer to my note here) and making the proper updates?

Yes. Let's get network prune working this way, and then use it as a template for the container prune PR review, and we can rework volume prune so that all use the same pattern.

  1. Question here

Some reasons we might need a server-side approach:

  • Performance, as mentioned before - if we're operating on a lot of resources, it could be that coalescing all the work under a single call could yield a decent speedup.
  • Transactional operations - if we find that we need a compound operation that needs "all or nothing" semantics across some set of resources, perhaps that's better done in the server? I don't have a concrete example in mind for this though.

Our APIs operate on a single resource at once today, each one locking the container collection.

  1. Is this going to also be revisited in the future? Are we ever going to build APIs that operate on more than a single resource?

There's nothing stopping us from that, or for moving compound operations like prune into the API server. The principal motivation for moving in the current direction (keeping things simple, and as you pointed out, consistent) for now is to start cleaning up the client code and get it in a good state for developers. Once that's done we can let time and experience guide how we enhance the client API and the underlying API server.

The thing that gets under my skin at present is that our client (the APIs and the core types) is dispersed across ContainerClient and Services. The dependencies between targets within the project are messier than they need to be. I want to change this so that ContainerClient contains all of the SDK material, such that you or I can look at the docc just for ContainerClient and find what we need to code against container.

There's a lot more developer documentation that could be done to support this. We still lack docs for our extension mechanisms (the plugin system in particular), for example.

@saehejkang
Copy link
Contributor Author

Yes. Let's get network prune working this way, and then use it as a template for the container prune PR review, and we can rework volume prune so that all use the same pattern.

I would like to spearhead this initiative and complete it to fruition. Once network prune is wrapped up, I can go back and help review the container prune PR, and then work on updates to volume prune.

cleaning up the client code and get it in a good state for developers

I want to change this so that ContainerClient contains all of the SDK material, such that you or I can look at the doc just for ContainerClient and find what we need to code against container.

There's a lot more developer documentation that could be done to support this.

I completely agree that cleanup is important and keeping things simple/consistent is important for quality. It is all coming full circle, as I remember asking a question in the discussions about how the SDK/ContainerClient works. I am sure that working on these updates will help me wrap my head around everything and I can work on adding some docs, hopefully in the near future.


Awesome points above and thank you for the all the explanations about the design/ your thought process!

If we decide on the client-side approach, it wouldn’t be fair for me to simply make the changes here and then merge my PR.

How do we want to proceed with network prune?

@jglogan
Copy link
Contributor

jglogan commented Dec 5, 2025

@saehejkang Consider yourself signed up for these...I'll watch for an update on this PR, we can merge it and yep, go ahead and move to the next! Thank you.

@saehejkang saehejkang marked this pull request as draft December 6, 2025 19:50
@saehejkang saehejkang force-pushed the add-network-prune-command branch from f9961dc to 3dbc333 Compare December 6, 2025 20:53
@saehejkang saehejkang marked this pull request as ready for review December 6, 2025 20:56
}

let networksToPrune = allNetworks.filter { network in
network.id != ClientNetwork.defaultNetworkName && !networksInUse.contains(network.id)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I purposefully left the check for if a network is running because the issue did not call for it. Furthermore, if the network is not being used by any containers, I feel should it not be pruned, no matter the state?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Request]: Add container network prune command to remove unused networks

3 participants