-
Notifications
You must be signed in to change notification settings - Fork 28
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Extract sample info from PANCAN_clinicalMatrix (#20)
* Extract sample info from PANCAN_clinicalMatrix Keeps only samples with type equal to "Primary Tumor". This filters multiple samples from the same patient, which could cause an issue for machine learning due to a dependent observations (discussed in #10). This filter reduced the number of samples with expression and mutation from 7,705 to 7,306. Closes #10: all variables that could help with sample selection or covariates, that are in PANCAN_clinicalMatrix, are extracted to `data/samples.tsv`. Relies on documentation of PANCAN_clinicalMatrix variables provided by the Xena Browser team in #14. Closes #17: only sample_ids with expression, mutation, and clinical data are output to `data/`. * Retain primary blood cancers Retain cancers whose type is "Primary Blood Derived Cancer - Peripheral Blood". See #20 (comment)
- Loading branch information
1 parent
11514bb
commit aa66efc
Showing
9 changed files
with
7,730 additions
and
1,043 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.