-
Notifications
You must be signed in to change notification settings - Fork 3
Model AD: Disease Correlation Data Transformation #179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Model AD: Disease Correlation Data Transformation #179
Conversation
…ons and updated some transform functions to use them
…single function check_required_datasets_and_columns()
…DT-Disease-Correlation-ETL
Replacing not == with !=. Co-authored-by: Brad Macdonald <52762200+BWMac@users.noreply.github.com>
…tps://github.com/Sage-Bionetworks/agora-data-tools into beatrizsaldana/MG-44/ADT-Disease-Correlation-ETL
…r functions as recommended by Brad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few more comments but I'll pre-approve. Great work!
I'll wait to merge after @jaclynbeck-sage approves :) |
tests/test_assets/disease_correlation/output/disease_correlation.json
Outdated
Show resolved
Hide resolved
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Sorry about the delay in reviewing :)
Problem
The disease correlation data transformation process needed a robust implementation to handle the transformation of disease correlation results, model information, and allele information into a standardized format. The transformation needed to handle various edge cases and ensure data consistency while processing multiple mouse models and their associated genetic information. Jira ticket: MG-44
Solution
Implemented a data transformation pipeline that:
model_details.py
andimmunohisto_transform.py
to use the validation functionNew Utility Functions
Added several utility functions to support data transformation and validation:
check_required_datasets_and_columns
flatten_list
remove_duplicates_keep_order
create_lookup
Test
Disease Correlation Tests
Basic Valid Input
Duplicate Results Handling
Duplicate Allele Information
Error Test Cases
Dataset Validation
Column Validation
Utility Function Tests
check_required_datasets_and_columns Tests
flatten_list Tests
remove_duplicates_keep_order Tests
create_lookup Tests