Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for deployment in Azure Government Cloud (Leonardo) #4813

Open
wants to merge 66 commits into
base: develop
Choose a base branch
from

Conversation

bennettn4
Copy link
Contributor

@bennettn4 bennettn4 commented Dec 11, 2024

Part of large effort to add support for services to run in Azure Government cloud.

Two parts:

  1. Support for government cloud curls
  • Added environment variable for AZURE_ENVIRONMENT
  • Updated azure-vm-init script to support gov suffix
  1. Fixes for running Leo in Azure. The user pet token is unavailable in Azure so calls which used it for Leo->WSM were replaced with the Leo service account token, and appropriate permissions to the workspace were given to the Leo service account.

https://broadworkbench.atlassian.net/browse/TOAZ-372
See related pull requests here:
workbench-libs sam bpm wsm cromwell terra-helmfile

Copy link

codecov bot commented Dec 30, 2024

Codecov Report

Attention: Patch coverage is 78.43137% with 11 lines in your changes missing coverage. Please review.

Project coverage is 74.77%. Comparing base (109a9a7) to head (9f228c7).

Files with missing lines Patch % Lines
...bench/leonardo/config/AzureHostingModeConfig.scala 60.00% 6 Missing ⚠️
...sde/workbench/leonardo/monitor/MonitorAtBoot.scala 54.54% 5 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@             Coverage Diff             @@
##           develop    #4813      +/-   ##
===========================================
- Coverage    74.77%   74.77%   -0.01%     
===========================================
  Files          165      165              
  Lines        14954    14955       +1     
  Branches      1187     1234      +47     
===========================================
  Hits         11182    11182              
- Misses        3772     3773       +1     
Files with missing lines Coverage Δ
...de/workbench/leonardo/app/CromwellAppInstall.scala 71.42% <100.00%> (+2.46%) ⬆️
...kbench/leonardo/app/CromwellRunnerAppInstall.scala 87.80% <100.00%> (+2.09%) ⬆️
...e/workbench/leonardo/app/HailBatchAppInstall.scala 62.50% <ø> (ø)
...te/dsde/workbench/leonardo/app/WdsAppInstall.scala 87.50% <100.00%> (+5.14%) ⬆️
...e/workbench/leonardo/app/WorkflowsAppInstall.scala 70.83% <100.00%> (+2.83%) ⬆️
...rkbench/leonardo/http/AppDependenciesBuilder.scala 97.91% <100.00%> (+0.02%) ⬆️
...ch/leonardo/http/service/LeoAppServiceInterp.scala 87.44% <100.00%> (+0.07%) ⬆️
.../dsde/workbench/leonardo/util/AKSInterpreter.scala 88.66% <100.00%> (-0.13%) ⬇️
...e/workbench/leonardo/util/AzurePubsubHandler.scala 84.56% <100.00%> (-0.11%) ⬇️
...workbench/leonardo/util/BuildHelmChartValues.scala 97.18% <ø> (ø)
... and 2 more

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 109a9a7...9f228c7. Read the comment docs.

@bennettn4 bennettn4 marked this pull request as ready for review December 30, 2024 17:40
@bennettn4 bennettn4 requested a review from a team as a code owner December 30, 2024 17:40
Copy link
Collaborator

@LizBaldo LizBaldo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Were you able to test this? The Azure integration tests have been removed a few months ago unfortunately 🤔

This looks good, but I have a few comments around preserving the pre-existing behavior for GCP as much as possible.

tokenOpt,
AppCreationException(s"Pet not found for user ${params.app.auditInfo.creator}", Some(ctx.traceId))
)
userToken <- F.pure(tokenOpt.getOrElse("")) // Empty token when running on Azure.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to differentiate between the azure case that does not require the token, and a true failure in the GCP case where the token is not found. Without the error message it is going to be trickier for us to debug potential user issues

@@ -99,6 +94,10 @@ class CromwellRunnerAppInstall[F[_]](config: CromwellRunnerAppConfig,
.map(v => raw"config.concurrentJobLimit=${v}")
}

// Get the pet userToken
tokenOpt <- samDao.getCachedArbitraryPetAccessToken(params.app.auditInfo.creator)
userToken <- F.pure(tokenOpt.getOrElse("")) // Empty token when running on Azure.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above, it would be good to preserve the error message in case of a true failure on the GCP side

raw"bard.enabled=${config.bardEnabled}"
raw"bard.enabled=${config.bardEnabled}",

// TEMPORARY HELM OVERRIDE VALUES WHILE WAITING FOR PR
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you link the PR? if it is the terra helmfile one you should be good to merge I think


// Get Vpa enabled tag
vpaEnabled <- F.pure(params.landingZoneResources.aksCluster.tags.getOrElse("aks-cost-vpa-enabled", false))

// Get the pet userToken
tokenOpt <- samDao.getCachedArbitraryPetAccessToken(params.app.auditInfo.creator)
userToken <- F.pure(tokenOpt.getOrElse("")) // Empty token when running on Azure.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment re preserving the error message

tokenOpt,
AppCreationException(s"Pet not found for user ${params.app.auditInfo.creator}", Some(ctx.traceId))
)
userToken <- F.pure(tokenOpt.getOrElse("")) // Empty token when running on Azure.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment re preserving the error message

@@ -163,12 +163,14 @@ final class LeoAppServiceInterp[F[_]: Parallel](config: AppServiceConfig,
// Retrieve parent workspaceId for the google project
parentWorkspaceId <- samService.lookupWorkspaceParentForGoogleProject(userInfo.accessToken.token, googleProject)

leoToken <- authProvider.getLeoAuthToken
leoEmail <- samService.getUserEmail(leoToken)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to leave a comment here explaining what the leoEmail will be used for. I am assuming that the answer is in the sam client, but would be good to isolate the GCP from the Azure case, even in a small comment

workspaceDescOpt <- tokenOpt.flatTraverse { token =>
wsmClientProvider.getWorkspace(token, workspaceId)
}
leoAuth <- samDAO.getLeoAuthToken
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the change here for both GCP and Azure case? I would like to keep using the cached token wherever possible.

@jsaun
Copy link
Contributor

jsaun commented Jan 9, 2025

Were you able to test this? The Azure integration tests have been removed a few months ago unfortunately 🤔

Yeah, we've been running this branch in our dev environment for a while, it will need some final validation as I don't think we've tested the most recent commits though.

This looks good, but I have a few comments around preserving the pre-existing behavior for GCP as much as possible.

Yep, makes sense. I was trying to avoid the branching logic if possible to keep it consistent/simpler but I will try to update shortly for those cases you pointed out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants