Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: remove seurat [do not merge yet] #7380

Closed
wants to merge 12 commits into from
Closed

feat: remove seurat [do not merge yet] #7380

wants to merge 12 commits into from

Conversation

joyceyan
Copy link
Contributor

@joyceyan joyceyan commented Nov 5, 2024

Reason for Change

https://app.zenhub.com/workspaces/single-cell-5e2a191dad828d52cc78b028/issues/gh/chanzuckerberg/single-cell-data-portal/7366

Changes

removes all code relevant to seurat. for dataset metadata updates, i'm just setting the dataset RDS version to be skipped always.

Testing steps

i updated the unit tests, and uploaded a dataset to the rdev to verify it went through the step function okay. it's available here: https://pr-7380-frontend.rdev.single-cell.czi.technology/collections/06b40cf0-f576-4e91-aeba-4f06925ae36d

Checklist 🛎️

  • Add product, design, and eng as reviewers for rdev review
  • For UI changes, add screenshots/videos, so the reviewers know what you expect them to see
  • For UI changes, add e2e tests to prevent regressions
  • For UI changes, verify impacted analytics events still work

Notes for Reviewer

🚨 we don't want to deploy this until we're ready to start the schema 5.3.0 migration

Copy link
Contributor

github-actions bot commented Nov 5, 2024

Deployment Summary

Copy link

codecov bot commented Nov 7, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.94%. Comparing base (46431eb) to head (be81eea).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7380      +/-   ##
==========================================
+ Coverage   92.90%   92.94%   +0.03%     
==========================================
  Files         194      192       -2     
  Lines       16831    16671     -160     
==========================================
- Hits        15637    15495     -142     
+ Misses       1194     1176      -18     
Flag Coverage Δ
unittests 92.94% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

f"Cannot find RDS artifact uri for {current_dataset_version_id}, and Conversion Status is not SKIPPED."
)
self.update_processing_status(new_dataset_version_id, DatasetStatusKey.RDS, DatasetConversionStatus.FAILED)
# Mark all RDS conversions as skipped
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seemed like the simplest backwards compatible solution

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this approach is consistent since we treat metadata updates as new versions, but we should double-check with Lattice/Brian that they're aware + cool with it

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nayib-jose-gloria
Copy link
Contributor

i updated the unit tests, but i'm not sure what the best way is to test the terraform changes and verify it's correct.

easiest would be to test an upload in the rdev and make sure it skips seurat and all other actions are behaving normally (can publish, FE looks normal, can update a collection DOI, etc.). Migration is basically just an orchestrator that runs the 'migrate' script then kicks off the regular ingestion pipeline so if regular ingestion works we can expect migration to work as well.

i assume i should also be making a corresponding change to single-cell-infra based on this comment: https://github.com/chanzuckerberg/single-cell-data-portal/blob/main/.happy/terraform/modules/sfn/main.tf#L1 ?

I think this might be outdated? Link is dead and I don't recall this code being duplicated in infra

if we merged this change and deployed it, this would mean we would no longer generate seurat conversions for any new uploads in addition to any schema version migrations we run, right? so we could theoretically launch this change without it being directly tied to a schema release?

Correct. We'll want to remove the seurat-specific code in the validator in the next release, which will require an update here as well since currently the 'validate' command returns a tuple of (is_valid, errors, is_seurat_convertible) and we'll want to change that to just (is_valid, errors). But it won't block launch if we just ignore the third return value for now

@@ -34,7 +33,6 @@
from backend.layers.thirdparty.uri_provider import UriProvider

base = importr("base")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r-specific, can remove this!

@@ -9,7 +9,6 @@

import scanpy
import tiledb
from rpy2.robjects import StrVector
from rpy2.robjects.packages import importr
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

r-specific, can remove this!

Copy link
Contributor

@nayib-jose-gloria nayib-jose-gloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/chanzuckerberg/single-cell-data-portal/blob/main/backend/layers/processing/schema_migration.py#L235

we have some error handling specific to seurat failures that we can remove for simplicity in the migration code

Copy link
Contributor

@nayib-jose-gloria nayib-jose-gloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense to set RDS status as skipped for everything for now, since the FE changes are pending and this lets us decouple without blocking. But I think it'd be useful to write a follow-up maintenance issue to deprecate RDS status (while still maintaining the assets in s3/surfacing them in the API)

@joyceyan
Copy link
Contributor Author

joyceyan commented Nov 8, 2024

I think it makes sense to set RDS status as skipped for everything for now, since the FE changes are pending and this lets us decouple without blocking. But I think it'd be useful to write a follow-up maintenance issue to deprecate RDS status (while still maintaining the assets in s3/surfacing them in the API)

i'm actually inclined to say that we should just keep rds_status indefinitely. there isn't any meaningful benefit to removing it since there's really only one place we're reading it from, and removing it seems like it would just cause more headache than it's worth, especially if it's a required field on the db level.

@nayib-jose-gloria
Copy link
Contributor

i'm actually inclined to say that we should just keep rds_status indefinitely. there isn't any meaningful benefit to removing it since there's really only one place we're reading it from, and removing it seems like it would just cause more headache than it's worth, especially if it's a required field on the db level.

the DB stores a dataset status JSON, but doesn't codify what its composed of. So we could get rid of rds status without having to update the DB. Agreed it opens us up to more regression risk to remove it, but it seems like for long-term code readability it would be confusing to keep a status indicator for an artifact type we don't plan on processing anymore. I think its not p0 but worth at least tracking as something to clean-up at some point

@joyceyan joyceyan changed the title feat: remove seurat [wip] feat: remove seurat [do not merge yet] Nov 13, 2024
Copy link
Contributor

@nayib-jose-gloria nayib-jose-gloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not exactly sure what the syntax error is, but there seems to be one still in the ingestion SFN. Successful jobs are being marked as failed in AWS due to one.

Copy link
Contributor

@nayib-jose-gloria nayib-jose-gloria left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: there's some additional seurat related downloads in the processing base image that we should remove as well. but I think that can safely be follow-on work and we can move ahead with this PR for the functional change. thank you!

@joyceyan
Copy link
Contributor Author

Copy link
Contributor

This PR has not seen any activity in the past 2 weeks; if nobody comments or reviews it in the next 3 days, this PR will be closed.

@github-actions github-actions bot added the stale label Nov 30, 2024
Copy link
Contributor

github-actions bot commented Dec 3, 2024

This PR was closed because it has been inactive for 17 days, 3 days since being marked as stale. Please re-open if you still need this to be addressed.

Copy link
Contributor

This PR has not seen any activity in the past 2 weeks; if nobody comments or reviews it in the next 3 days, this PR will be closed.

@github-actions github-actions bot added the stale label Dec 19, 2024
@github-actions github-actions bot removed the stale label Dec 20, 2024
Copy link
Contributor

github-actions bot commented Jan 4, 2025

This PR has not seen any activity in the past 2 weeks; if nobody comments or reviews it in the next 3 days, this PR will be closed.

@github-actions github-actions bot added the stale label Jan 4, 2025
Copy link
Contributor

github-actions bot commented Jan 7, 2025

This PR was closed because it has been inactive for 17 days, 3 days since being marked as stale. Please re-open if you still need this to be addressed.

Copy link
Contributor

This PR has not seen any activity in the past 2 weeks; if nobody comments or reviews it in the next 3 days, this PR will be closed.

@github-actions github-actions bot added the stale label Jan 26, 2025
@Bento007 Bento007 removed the stale label Jan 28, 2025
@nayib-jose-gloria nayib-jose-gloria changed the base branch from main to feature/schema-5-3 January 28, 2025 19:10
@nayib-jose-gloria nayib-jose-gloria changed the base branch from feature/schema-5-3 to main January 28, 2025 19:10
@joyceyan joyceyan closed this Jan 28, 2025
@joyceyan joyceyan reopened this Jan 28, 2025
@joyceyan
Copy link
Contributor Author

Closing this PR since the work for this has been folded into the 5.3 feature branch: main...feature/schema-5-3

@joyceyan joyceyan closed this Jan 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants