Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC79: Incremental Upload of Data Entries #48

Merged
merged 139 commits into from
Jul 16, 2024
Merged

RFC79: Incremental Upload of Data Entries #48

merged 139 commits into from
Jul 16, 2024

Conversation

forus
Copy link
Contributor

@forus forus commented Jun 19, 2024

See also RFC79

The solution involves extending the current metaImport.py script and java data loader commands with additional flags to support the incremental upload of entries. This approach allows users to add patients, samples, and molecular data without having to reupload the entire study.

Read more in docs cBioPortal/cbioportal#10816

To review in logical parts and see previous discussions about the implementation, you might want to check the chain of closed PRs this PR consists of:

forus added 30 commits April 14, 2024 13:02
To make the dataset look like real data in the database
Apperently, the flag does not change anything.
But we add it anyway as the tests for "incremental" data upload.
adding to the all case list and case list specified with command arguments is supported
From case lists that is not _all case list and not specified with --add-to-case-lists option
We changed them to work for the demo.
Mutation numbers did not change on demo.
Not it was easy to be confused where sample and clinical_sample (attributes),
patient and clinical_patient (attributes) related code
This flag for command to upload molecular profile data
pieterlukasse
pieterlukasse previously approved these changes Jun 27, 2024
@sheridancbio
Copy link
Contributor

I will try to review this.

One quick comment - when it comes time to merge this PR I hope we use "squash and merge" rather than merging in the individual changesets.

Copy link
Contributor

@sheridancbio sheridancbio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Partial review (python components only)
Looking good so far.

pom.xml Show resolved Hide resolved
scripts/importer/cbioportalImporter.py Show resolved Hide resolved
scripts/importer/cbioportalImporter.py Show resolved Hide resolved
scripts/importer/cbioportalImporter.py Show resolved Hide resolved
scripts/importer/validateData.py Show resolved Hide resolved
scripts/importer/validateData.py Show resolved Hide resolved
scripts/importer/validateData.py Show resolved Hide resolved
Copy link
Contributor

@sheridancbio sheridancbio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review (covering the DAO classes only)

forus added 5 commits July 11, 2024 16:32
Make it explicity that function will delete any matching records "if they exist"
…tribute

Specify that sampleIds is optional and can be set to null
@forus forus requested a review from sheridancbio July 11, 2024 15:41
Copy link
Contributor

@sheridancbio sheridancbio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review (Part 3) java classes

Copy link
Contributor

@sheridancbio sheridancbio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

partial review (part 4) still working through scripts package - up to ImportGeneData

@forus forus mentioned this pull request Jul 12, 2024
Copy link
Contributor

@sheridancbio sheridancbio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good. Legacy functionality is supported and new functionality is enabled through command line arguments. Changes are described in the approved RFC-79 document and code follows that closely.

The remainder of the java code (after my part 4 partial review) was reviewed by @forus and I interactively yesterday. My review did not cover the test case data, but testing approaches (including python script "called" java methods) was discussed as well.

The MSK importer has not yet been tested with these changes, but I believe that any problems which are observed can be addressed in the MSK codebase itself (which mainly wraps/depends on the DAO classes here)

@forus forus merged commit e7cfb7b into main Jul 16, 2024
4 checks passed
@forus forus deleted the rfc79 branch July 16, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants