Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Covid19Data Portal missing Lineages handling #133

Open
johausmann opened this issue Sep 14, 2023 · 1 comment
Open

Covid19Data Portal missing Lineages handling #133

johausmann opened this issue Sep 14, 2023 · 1 comment
Labels
bug Something isn't working

Comments

@johausmann
Copy link
Member

I noticed another problem after the Covid19DataPortal update. We get the lineage information from the portal directly and store it in the database. After the Data Portal update I looked at the lineages plot and saw that it does not go further than October last year. I then checked the samples from the update and saw that of the 462.957 new assemblies, 348.256 have no lineage in the repository. Of the remaining assemblies, the newest ones have a collection_date < 2022-02-28, so the plot shows no 2023 lineages.

@johausmann johausmann added the bug Something isn't working label Sep 14, 2023
@johausmann
Copy link
Member Author

Okay, we need to address the following items.

  • Update Covid19DataPortal accessor to be aware of missing lineage.
  • Update Processor and Pipeline module to optionally call Pangolin in case of missing lineage.
    • Pipeline call without '--skip_pangolin'
    • Add mode to check if mutations are present in database and VCF file exists --> run Pangolin only.
  • Update Covigator NGS pipeline to have a mode for Pangolin only.
  • Enable loading of Pangolin results from assemblies into database (code is present, needs to called based on call).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant