Appropriate subsetting of clinical data regarding a TCGA MAF object #664
-
Dear Anand, my question mainly concerns the robust handing of clinical data inside a downloaded maf object with TCGAmutations R package, and how to perform appropriate subsetting on specific phenotype attributes. In detail, based on a current project, I'm trying to analyze both mutational and gene expression data, based on the same subset of patients regarding a TCGA cohort:
My main goals, are: 1) to keep only the primary solid tumor samples & 2) to keep only the first 12 characters in the Tumor_Sample_Barcode, in order to intersect with the gene expression patients ids, to identify the common samples. Thus, my crucial questions are the following: A) For updating the clinical data, which putative steps should be the following:
B) Afterwards, to subset the maf object only with the selected common patient ids before any downstream analysis:
Thank you in advance, Efstathios |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hello Efstathios, A. I would not recommend the first way of replacing existing clinical data with the altered ones since the sample IDs wont be matching. Second way is probably the best but remeber that B. Yes, it looks ok. |
Beta Was this translation helpful? Give feedback.
Hello Efstathios,
A. I would not recommend the first way of replacing existing clinical data with the altered ones since the sample IDs wont be matching. Second way is probably the best but remeber that
coad.maf
is already an MAF object. So you can not pass it oread.maf
. You could use thewrite.mafSummary()
and import again with theread.maf
B. Yes, it looks ok.