Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PFB Handover fails in LungMap catalog #5469

Closed
14 tasks
nadove-ucsc opened this issue Aug 15, 2023 · 10 comments
Closed
14 tasks

PFB Handover fails in LungMap catalog #5469

nadove-ucsc opened this issue Aug 15, 2023 · 10 comments
Assignees
Labels
bug [type] A defect preventing use of the system as specified demo [process] To be demonstrated at the end of the sprint demoed [process] Successfully demonstrated to team manifests [subject] Generation and contents of manifests orange [process] Done by the Azul team spike:5 [process] Spike estimate of five points

Comments

@nadove-ucsc
Copy link
Contributor

nadove-ucsc commented Aug 15, 2023

image

  • Security design review completed; the Resolution of this issue does not
    • … affect authentication; for example:
      • OAuth 2.0 with the application (API or Swagger UI)
      • Authentication of developers with Google Cloud APIs
      • Authentication of developers with AWS APIs
      • Authentication with a GitLab instance in the system
      • Password and 2FA authentication with GitHub
      • API access token authentication with GitHub
      • Authentication with
    • … affect the permissions of internal users like access to
      • Cloud resources on AWS and GCP
      • GitLab repositories, projects and groups, administration
      • an EC2 instance via SSH
      • GitHub issues, pull requests, commits, commit statuses, wikis, repositories, organizations
    • … affect the permissions of external users like access to
      • TDR snapshots
    • … affect permissions of service or bot accounts
      • Cloud resources on AWS and GCP
    • … affect audit logging in the system, like
      • adding, removing or changing a log message that represents an auditable event
      • changing the routing of log messages through the system
    • … affect monitoring of the system
    • … introduce a new software dependency like
      • Python packages on PYPI
      • Command-line utilities
      • Docker images
      • Terraform providers
    • … add an interface that exposes sensitive or confidential data at the security boundary
    • … affect the encryption of data at rest
    • … require persistence of sensitive or confidential data that might require encryption at rest
    • … require unencrypted transmission of data within the security boundary
    • … affect the network security layer; for example by
      • modifying, adding or removing firewall rules
      • modifying, adding or removing security groups
      • changing or adding a port a service, proxy or load balancer listens on
  • Documentation on any unchecked boxes is provided in comments below
@nadove-ucsc nadove-ucsc added the orange [process] Done by the Azul team label Aug 15, 2023
@achave11-ucsc
Copy link
Member

Assignee to acquire a reproduction.

@achave11-ucsc achave11-ucsc added the needs info [process] Resolution requires more information label Aug 16, 2023
@achave11-ucsc achave11-ucsc added spike:3 [process] Spike estimate of three points and removed needs info [process] Resolution requires more information labels Aug 17, 2023
@nadove-ucsc
Copy link
Contributor Author

The manifest from the above link contains a donor aggregate where the age_range field contains a mix of value/unit object and empty strings.

$ pfb show --input hca-manifest-7bc50d88-3765-5909-b31c-824554e66938.1a5242f3-aea4-56c8-83ec-c912a74fe757.avro | jq 'select(.name=="donors")' | jq -s '.[2].object | {document_id, organism_age}'
{
  "document_id": [
    "07a620ff-1ac8-45bb-85c1-2b7af107d68d",
    "17c9743a-bad9-417d-a56a-93a7b3e9ea71",
    "3886b8ef-fe67-480c-8496-7799ceabe6de",
    "42194a89-2b58-4bb0-9d70-e403db53bbf9",
    "90d58445-99d0-46e8-a882-5beac347a7a2",
    "913f8cad-2080-4ec7-a51f-aba0d4f09b66",
    "b122c921-e432-42c7-a5c6-f245f106970b"
  ],
  "organism_age": [
    {
      "value": "15",
      "unit": "day"
    },
    {
      "value": "3",
      "unit": "day"
    },
    {
      "value": "42",
      "unit": "day"
    },
    {
      "value": "7",
      "unit": "day"
    },
    ""
  ]
}

Based on the error message, it is the type mismatch between the string and the objects that is being rejected by Terra.

@nadove-ucsc
Copy link
Contributor Author

This file is linked to the same set of donors shown above. The organism_age field matches that observed in the manifest except that the last element is null instead of "".

$ http 'https://service.azul.data.humancellatlas.org/index/files/0aec111a-fbb0-4063-bba2-b73c0666f17d?catalog=lm3' | jq '.donorOrganisms[] | {id, organismAge}'
{
  "id": [
    "07a620ff-1ac8-45bb-85c1-2b7af107d68d",
    "17c9743a-bad9-417d-a56a-93a7b3e9ea71",
    "3886b8ef-fe67-480c-8496-7799ceabe6de",
    "42194a89-2b58-4bb0-9d70-e403db53bbf9",
    "90d58445-99d0-46e8-a882-5beac347a7a2",
    "913f8cad-2080-4ec7-a51f-aba0d4f09b66",
    "b122c921-e432-42c7-a5c6-f245f106970b"
  ],
  "organismAge": [
    {
      "value": "15",
      "unit": "day"
    },
    {
      "value": "3",
      "unit": "day"
    },
    {
      "value": "42",
      "unit": "day"
    },
    {
      "value": "7",
      "unit": "day"
    },
    null
  ]
}

On prod, we replace nulls with the empty string when creating the Avro PFB manifest source. This is no longer the case on dev (#5444). I hypothesize that if this change were promoted to prod, and the null value was restored, the error would be resolved, because it seems logical that Terra should accept null where it expects an object. I am struggling to think of a way to test this hypothesis.

@nadove-ucsc
Copy link
Contributor Author

As luck would have it, the exact same file is also present on dev in the lm2 catalog. I created a single-file manifest on Azul dev and Azul prod containing this file, and confirmed that the only difference between their contents was that the manifest created on prod has the nulls replaced with empty strings.

I imported the manifest created on prod into a new workspace and reproduced the original error.

image

I then imported the manifest created on dev into a new workspace. The import was successful.

image

I believe that this proves my hypothesis correct, meaning that this is a freebie or duplicate of #2462.

@nadove-ucsc nadove-ucsc added spike:5 [process] Spike estimate of five points and removed spike:3 [process] Spike estimate of three points labels Aug 18, 2023
@achave11-ucsc
Copy link
Member

Assignee to estimate prevalence of the problem per @bvizzier-ucsc.

@achave11-ucsc achave11-ucsc added bug [type] A defect preventing use of the system as specified manifests [subject] Generation and contents of manifests labels Aug 18, 2023
@achave11-ucsc
Copy link
Member

Fixed in #5444.

@bvizzier-ucsc
Copy link

The goal is to be able to loosely quantify the problem so we can notify LungMAP users and ask anyone who encounters it to let us know so we are aware of the impact. Knowing the breadth of impact will inform the relative priority of this issue.

@achave11-ucsc achave11-ucsc added the demo [process] To be demonstrated at the end of the sprint label Aug 18, 2023
@nadove-ucsc
Copy link
Contributor Author

nadove-ucsc commented Aug 18, 2023

For demo, attempt to reproduce on prod.

@nadove-ucsc
Copy link
Contributor Author

nadove-ucsc commented Aug 18, 2023

This problem affects:

  • 9/600 files in the entirety of catalog lm3 (1.5%)
  • 4/999 and 7/999 files in two random samples of catalog dcp29 (0.55%)

@nadove-ucsc nadove-ucsc added the demoed [process] Successfully demonstrated to team label Oct 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug [type] A defect preventing use of the system as specified demo [process] To be demonstrated at the end of the sprint demoed [process] Successfully demonstrated to team manifests [subject] Generation and contents of manifests orange [process] Done by the Azul team spike:5 [process] Spike estimate of five points
Projects
None yet
Development

No branches or pull requests

3 participants