Skip to content

Conversation

@ms-henglu
Copy link
Member

No description provided.

Copy link
Member

@matt-FFFFFF matt-FFFFFF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @ms-henglu

I have been thinking about this for a while and whilst I like the goal of making things easy for the caller, I think this approach may cause problems.

Fundamentally, the issues are around the asymmetry of creating a resource with one API, then deleting it with another. All whilst being invisible to the caller.

We encourage customers to use private endpoints (PEs) for PaaS resources. In the instance where PEs were configured, the implementation here would mean that a caller would be able to create the key resource, but would not be able to delete it unless the client had access to the private endpoint. This would be unexpected as the caller would assume that they were interfacing with the Azure management API, not the data plane.

If we are going to use the data plane then we should make the caller be explicit about doing so in my opinion. I still prefer the generic data plane approach in #918. I think it's mroe useful and explicit.

Wondering if @jaredfholgate has any thoughts?

Comment on lines +38 to +45
path := id.AzureResourceId
path = strings.TrimPrefix(path, "/")
path = strings.TrimSuffix(path, "/")
components := strings.Split(path, "/")
parts := make(map[string]string)
for i := 0; i < len(components)-1; i += 2 {
parts[components[i]] = components[i+1]
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps this code should be part of the parse package, it seems pretty generic?

return fmt.Errorf("key vault name is missing in the resource ID: %s", id.AzureResourceId)
}

resourceID := fmt.Sprintf("%s.vault.azure.net/keys/%s", parts["vaults"], id.Name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think hard coding this will be problematic for sovereign cloud:

For China cloud: https://vault.azure.cn

For US Gov cloud: https://vault.usgovcloudapi.net

For Germany: https://vault.microsoftazure.de


resourceID := fmt.Sprintf("%s.vault.azure.net/keys/%s", parts["vaults"], id.Name)

_, err := dataPlaneClient.Action(ctx, resourceID, "", "7.4", http.MethodDelete, nil, options)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two comments here:

  1. I think this will be problematic for callers using private endpoint, and will be confusing as they will assume they are talking to the management plane.
  2. Hard coding the Api-Version could make this harder to maintain over time.


}

func GetCustomization(resourceType string) *Resource {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that the interface will store a pointer to the concrete type, consider making this return a Resource rather than a *Resource.

Also consider returning a bool value, making it similar to the idiomatic go: if val, ok := GetCustomization(s); ok {}

@jaredfholgate
Copy link
Member

Hi @ms-henglu

I have been thinking about this for a while and whilst I like the goal of making things easy for the caller, I think this approach may cause problems.

Fundamentally, the issues are around the asymmetry of creating a resource with one API, then deleting it with another. All whilst being invisible to the caller.

We encourage customers to use private endpoints (PEs) for PaaS resources. In the instance where PEs were configured, the implementation here would mean that a caller would be able to create the key resource, but would not be able to delete it unless the client had access to the private endpoint. This would be unexpected as the caller would assume that they were interfacing with the Azure management API, not the data plane.

If we are going to use the data plane then we should make the caller be explicit about doing so in my opinion. I still prefer the generic data plane approach in #918. I think it's mroe useful and explicit.

Wondering if @jaredfholgate has any thoughts?

I'm thinking that for this particular use case, given it seems the delete operation is not supported via the management plane, then perhaps we should be setting the enabled flag instead?

I agree that mixing management plane and data plane operations is not a good solution and is one of the fundamental problems with azurerm. We should not introduce that problem to azapi. Mixing it in a single resource would be even worse and very confusing for customers using our recommended private networking options.

I appreciate that given the API does not support this we are stuck here. Perhaps we can handle this with good docs instead? Or worst case, it should be a specific opt in provider level feature for Key Vault.

We should ideally provide an alternative mechanism for them to manage it all via the data plane, which is explicitly chosen by the user. That may already exist?

I'm wondering if we can also ask the PG to implement the Delete request support? Why doesn't it allow that?

@ms-henglu
Copy link
Member Author

Hi @matt-FFFFFF , @jaredfholgate ,

Thank you so much for the code review!

Yes, I agree that mixing mgmt plane and data plane is not a good idea. But it solves the some critial issues. For this key vault key's case, it could not be deleted by azapi_resource anyway, users will hit the 405 method not supported error. With this customization, if it allows to access data-plane, the resource can be deleted, otherwise user still sees an error, it doesn't make it worse.

To make it less confusing for customers, as Jared suggested, we can

  1. well document the customization.
  2. add options to enable/disable customization. When disable it, we can make the key vault key deletion as no-op to avoid error, or return an error with good explanation instead of the original 405 error.

I want to use the customization layer to:

  1. solve the some of the critial issues causd by bad API design, for example, key vault access policy which uses /{operationKind} instead of http method to manage a resource.
    https://github.com/Azure/azure-rest-api-specs/blob/main/specification/keyvault/resource-manager/Microsoft.KeyVault/stable/2023-07-01/keyvault.json#L267
  2. (maybe) provide similar features like azurerm to purge resources in soft deleted states.

I also don't encourge to mix the mgmt-plane and data-plane in the customization layer, each case should be carefully evaluated.

As for the alternative mentioned in #918, to provide a azapi_resource_action in data-plane, I think it's not a good idea to use action resource to manage a resource's full lifecycle. It's okay to use it to perform some actions on a resource, but using it to create or delete a resource is not a good practice. Becase the state could not be tracked properly and delete scenario could be complex. I'd recommend to add customization layer to the azapi_data_plane_resource instead, so that it can be used to manage the full lifecycle of a resource. Though it requires manual work to implement the customization, we can start with the P0 cases and then expand it to other resources as needed.

@matt-FFFFFF
Copy link
Member

Hi @ms-henglu

I appreciate your response. Unfortunately I must raise strong objections to the approach you've suggested here. Some of them go against the fundamental principles of IaC.

In terms of the KV resource, your mitigation in this PR only provides a solution in the case where customers are not using our security best practices (private endpoints). For enterprise customers this approach will not work.

Furthermore, making the delete a no-op is a fundamental breaking of the promise of stateful IaC and we should absolutely not do this in my opinion.

Instead, I would like to see us explore the use of data plane resource for managing keys/secrets and perhaps disabling/warning the ability of azapi_resource to manage these resources. This seems to me a better solution than the proposal here because we are being explicit.

If customers do not like the management plane API behavior of KV secrets/keys then they should complain through their account teams. I don't think it's the job of IaC to provide a mitigation. I agree the API design is not very good, I would like to see it change back again.

@stemaMSFT
Copy link
Member

Thanks for raising these concerns and for the conversation @ms-henglu and @matt-FFFFFF. I think we need to think hard about this before we merge a PR in any particular direction. Let me try to bring in the rest of the PM team with context on this issue and get back to you on where we lean. Does that sound ok to both of you?

@ms-henglu
Copy link
Member Author

ms-henglu commented Jul 17, 2025

In terms of the KV resource, your mitigation in this PR only provides a solution in the case where customers are not using our security best practices (private endpoints). For enterprise customers this approach will not work.

You're right. For the private endpoints users, they're using the data-plane to manage the keys/secrets from the beginning, so this solution is not working.

Furthermore, making the delete a no-op is a fundamental breaking of the promise of stateful IaC and we should absolutely not do this in my opinion.

Yes, no-op is not a good idea.

Instead, I would like to see us explore the use of data plane resource for managing keys/secrets and perhaps disabling/warning the ability of azapi_resource to manage these resources. This seems to me a better solution than the proposal here because we are being explicit.

Yes, I'll open another PR to enable data-plane resource to support the keys/secrets.

Thank you @matt-FFFFFF for these good suggestions!

@ms-henglu ms-henglu closed this Jul 17, 2025
@ms-henglu
Copy link
Member Author

Thanks for raising these concerns and for the conversation @ms-henglu and @matt-FFFFFF. I think we need to think hard about this before we merge a PR in any particular direction. Let me try to bring in the rest of the PM team with context on this issue and get back to you on where we lean. Does that sound ok to both of you?

Sounds good! Thanks. @stemaMSFT

@matt-FFFFFF
Copy link
Member

Thank you @matt-FFFFFF for these good suggestions!

@ms-henglu apologies for my tone earlier, I hope I did not offend you. I read my post again and it sounded a little strong. Thanks for listening to my feedback 👍🏻

@ms-henglu
Copy link
Member Author

Hi @matt-FFFFFF , thank you for your message - I really appreciate your thoughtfulness. No offense taken at all, and I value your feedback. Thanks again!

@ms-henglu ms-henglu deleted the feature-customization branch August 11, 2025 06:31
@ms-henglu ms-henglu restored the feature-customization branch August 11, 2025 06:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants