Appropriate subsetting of clinical data regarding a TCGA MAF object #664

Jasonmbg · 2021-02-11T16:45:00Z

Jasonmbg
Feb 11, 2021

Dear Anand,

my question mainly concerns the robust handing of clinical data inside a downloaded maf object with TCGAmutations R package, and how to perform appropriate subsetting on specific phenotype attributes. In detail, based on a current project, I'm trying to analyze both mutational and gene expression data, based on the same subset of patients regarding a TCGA cohort:

coad.maf <- TCGAmutations::tcga_load(study = "COAD")
 head(getClinicalData(coad.maf)[1:6,76:77])
           Tumor_Sample_Barcode sample_type_description
1: TCGA-3L-AA1B-01A-11D-A36X-10     Primary Solid Tumor
2: TCGA-4N-A93T-01A-11D-A36X-10     Primary Solid Tumor
3: TCGA-4T-AA8H-01A-11D-A40P-10     Primary Solid Tumor
4: TCGA-5M-AAT4-01A-11D-A40P-10     Primary Solid Tumor
5: TCGA-5M-AAT6-01A-11D-A40P-10     Primary Solid Tumor
6: TCGA-5M-AATE-01A-11D-A40P-10     Primary Solid Tumor

table(coad.maf@clinical.data$sample_type_description)

           Metastatic   Primary Solid Tumor Recurrent Solid Tumor 
                    1                   404                     1

My main goals, are: 1) to keep only the primary solid tumor samples & 2) to keep only the first 12 characters in the Tumor_Sample_Barcode, in order to intersect with the gene expression patients ids, to identify the common samples.
Then, subset the maf object with only the common ids, and perform any downstream analysis.

Thus, my crucial questions are the following:

A) For updating the clinical data, which putative steps should be the following:

clin.dat <- getClinicalData(coad.maf)
Convert from the above data frame the column Tumor_Sample_Barcode into the first 12 characters, and also remove the 2 samples that are not primary, and have a data frame called clinical.updated.dat
Then, which should be more robust to proceed firstly:

coad.maf@clinical.data <-clinical.updated.dat ?
or maf.updated <- maftools::read.maf(maf = coad.maf, clinicalData = clinical.updated.dat, isTCGA = TRUE) ? To have a more uniform update in all the slots ?

B) Afterwards, to subset the maf object only with the selected common patient ids before any downstream analysis:

maf.subset <- subsetMaf(maf = maf.updated, tsb = dt, mafObj = TRUE) #where dt is a character vector of the selected common patient ids ?

Thank you in advance,

Efstathios

Answered by PoisonAlien

Feb 12, 2021

Hello Efstathios,

A. I would not recommend the first way of replacing existing clinical data with the altered ones since the sample IDs wont be matching. Second way is probably the best but remeber that coad.maf is already an MAF object. So you can not pass it o read.maf. You could use the write.mafSummary() and import again with the read.maf

B. Yes, it looks ok.

View full answer

PoisonAlien · 2021-02-12T09:35:57Z

PoisonAlien
Feb 12, 2021
Maintainer

Hello Efstathios,

A. I would not recommend the first way of replacing existing clinical data with the altered ones since the sample IDs wont be matching. Second way is probably the best but remeber that coad.maf is already an MAF object. So you can not pass it o read.maf. You could use the write.mafSummary() and import again with the read.maf

B. Yes, it looks ok.

2 replies

Jasonmbg Feb 12, 2021
Author

Dear Anand,

thank you very much for your reply and suggestions-one updated comment in order to fully understand your notion-regarding the A. part:

you would agree that if I use write.mafSummary() , and then re-read the txt with read.maf() if the updated clinical contains less samples, it might cause an error ? and perhaps it is better to make only the change with the 12 characters to modify the Tumor_Sample_Barcode, and then only subset the maf file with the common IDs with the subsetMaf() function ? as the second step ?

Best,

Efstathios

PoisonAlien Feb 12, 2021
Maintainer

You can use your same command to re-read the maf with the updated clinical, it will not complain about the missing sample names.

maf.updated <- maftools::read.maf(maf = "path_to.maf", clinicalData = clinical.updated.dat, isTCGA = TRUE)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Appropriate subsetting of clinical data regarding a TCGA MAF object #664

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 2 replies

{{title}}

{{title}}

{{title}}

Select a reply

Appropriate subsetting of clinical data regarding a TCGA MAF object #664

Jasonmbg Feb 11, 2021

Replies: 1 comment · 2 replies

PoisonAlien Feb 12, 2021 Maintainer

Jasonmbg Feb 12, 2021 Author

PoisonAlien Feb 12, 2021 Maintainer

Jasonmbg
Feb 11, 2021

Replies: 1 comment 2 replies

PoisonAlien
Feb 12, 2021
Maintainer

Jasonmbg Feb 12, 2021
Author

PoisonAlien Feb 12, 2021
Maintainer