-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: remove seurat [do not merge yet] #7380
Conversation
Deployment Summary
|
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #7380 +/- ##
==========================================
+ Coverage 92.90% 92.94% +0.03%
==========================================
Files 194 192 -2
Lines 16831 16671 -160
==========================================
- Hits 15637 15495 -142
+ Misses 1194 1176 -18
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
2058266
to
a017251
Compare
f"Cannot find RDS artifact uri for {current_dataset_version_id}, and Conversion Status is not SKIPPED." | ||
) | ||
self.update_processing_status(new_dataset_version_id, DatasetStatusKey.RDS, DatasetConversionStatus.FAILED) | ||
# Mark all RDS conversions as skipped |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seemed like the simplest backwards compatible solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this approach is consistent since we treat metadata updates as new versions, but we should double-check with Lattice/Brian that they're aware + cool with it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
easiest would be to test an upload in the rdev and make sure it skips seurat and all other actions are behaving normally (can publish, FE looks normal, can update a collection DOI, etc.). Migration is basically just an orchestrator that runs the 'migrate' script then kicks off the regular ingestion pipeline so if regular ingestion works we can expect migration to work as well.
I think this might be outdated? Link is dead and I don't recall this code being duplicated in infra
Correct. We'll want to remove the seurat-specific code in the validator in the next release, which will require an update here as well since currently the 'validate' command returns a tuple of (is_valid, errors, is_seurat_convertible) and we'll want to change that to just (is_valid, errors). But it won't block launch if we just ignore the third return value for now |
@@ -34,7 +33,6 @@ | |||
from backend.layers.thirdparty.uri_provider import UriProvider | |||
|
|||
base = importr("base") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r-specific, can remove this!
@@ -9,7 +9,6 @@ | |||
|
|||
import scanpy | |||
import tiledb | |||
from rpy2.robjects import StrVector | |||
from rpy2.robjects.packages import importr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
r-specific, can remove this!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we have some error handling specific to seurat failures that we can remove for simplicity in the migration code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it makes sense to set RDS status as skipped for everything for now, since the FE changes are pending and this lets us decouple without blocking. But I think it'd be useful to write a follow-up maintenance issue to deprecate RDS status (while still maintaining the assets in s3/surfacing them in the API)
i'm actually inclined to say that we should just keep rds_status indefinitely. there isn't any meaningful benefit to removing it since there's really only one place we're reading it from, and removing it seems like it would just cause more headache than it's worth, especially if it's a required field on the db level. |
the DB stores a dataset status JSON, but doesn't codify what its composed of. So we could get rid of rds status without having to update the DB. Agreed it opens us up to more regression risk to remove it, but it seems like for long-term code readability it would be confusing to keep a status indicator for an artifact type we don't plan on processing anymore. I think its not p0 but worth at least tracking as something to clean-up at some point |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not exactly sure what the syntax error is, but there seems to be one still in the ingestion SFN. Successful jobs are being marked as failed in AWS due to one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: there's some additional seurat related downloads in the processing base image that we should remove as well. but I think that can safely be follow-on work and we can move ahead with this PR for the functional change. thank you!
@nayib-jose-gloria Good catch, I created a ticket for that here: https://app.zenhub.com/workspaces/single-cell-5e2a191dad828d52cc78b028/issues/gh/chanzuckerberg/single-cell-data-portal/7390 |
This PR has not seen any activity in the past 2 weeks; if nobody comments or reviews it in the next 3 days, this PR will be closed. |
This PR was closed because it has been inactive for 17 days, 3 days since being marked as stale. Please re-open if you still need this to be addressed. |
This PR has not seen any activity in the past 2 weeks; if nobody comments or reviews it in the next 3 days, this PR will be closed. |
This PR has not seen any activity in the past 2 weeks; if nobody comments or reviews it in the next 3 days, this PR will be closed. |
This PR was closed because it has been inactive for 17 days, 3 days since being marked as stale. Please re-open if you still need this to be addressed. |
This PR has not seen any activity in the past 2 weeks; if nobody comments or reviews it in the next 3 days, this PR will be closed. |
Closing this PR since the work for this has been folded into the 5.3 feature branch: main...feature/schema-5-3 |
Reason for Change
https://app.zenhub.com/workspaces/single-cell-5e2a191dad828d52cc78b028/issues/gh/chanzuckerberg/single-cell-data-portal/7366
Changes
removes all code relevant to seurat. for dataset metadata updates, i'm just setting the dataset RDS version to be skipped always.
Testing steps
i updated the unit tests, and uploaded a dataset to the rdev to verify it went through the step function okay. it's available here: https://pr-7380-frontend.rdev.single-cell.czi.technology/collections/06b40cf0-f576-4e91-aeba-4f06925ae36d
Checklist 🛎️
Notes for Reviewer
🚨 we don't want to deploy this until we're ready to start the schema 5.3.0 migration