Skip to content

Data Loading : What You Need To Change

Manuel Holtgrewe edited this page Apr 22, 2016 · 9 revisions

This page contains an overview to help you transition to the new file formats. As this should be a one-time operation for all users, this page is only temporary.

  1. (recommended) Update your genes table, see issues #799 and #805 on how to do this

    • Reason: cBioPortal in the past accidentally imported the wrong column as HUGO symbol. This will cause many warnings about invalid genes during the validation process.
  2. Be aware: now there is a strict validation of the file column header names for all data files that have Entrez Id and Hugo gene symbol columns. The column names have to be Entrez_Gene_Id and Hugo_Symbol. This can be a change if you are expecting the position of the column to be important rather than the name. The columns still should be placed before any of the samples columns, though (i.e. only the columns after Entrez_Gene_Id and Hugo_Symbol columns are considered as sample columns). The new validator will warn you when your file does not comply to at least having the Entrez_Gene_Id column (which is the recommended column to use for gene identifiers).

  3. Be aware: now there is a strict validation on datatype in the meta files, now also documented in the updated File formats page (and in table below)

  4. Other changes: check the following table for your data types:

DataType What you have to do
Cancer Study (optionally) Add add_global_case_list
Cancer Type Create the meta file
Discrete Copy Number Data Update meta file:
  • change stable_id to gistic, cna, cna_rae or cna_consensus
  • add data_filename
Remark: copynumber profiles used by the cross-cancer histogram no longer use the name to check whether the data is GISTIC or RAE; this is now based on the stable_id.
Copy Number Data Update meta file:
  • if datatype is LOG-VALUE change it to LOG2-VALUE
  • if datatype is CONTINUOUS, change stable_id to linear_CNA
  • add data_filename
Segmented Data Update meta file:
  • change genetic_alteration_type to COPY_NUMBER_ALTERATION
  • change datatype to SEG
  • remove: stable_id, show_profile_in_analysis_tab, profile_name, profile_description
  • add: description, data_filename
Expression Data Update meta file:
  • check your stable_id against the table
  • add data_filename
Mutation Data Update meta file:
  • change your stable id to mutations
  • add data_filename
Fusion Data (TODO) Update meta file:
  • add data_filename
Methylation Data Update meta file:
  • change stable_id to methylation_hm27 or methylation_hm450
  • add data_filename
RPPA Data Update meta file:
  • change genetic_alteration_type to PROTEIN_LEVEL
  • change datatype to LOG2-VALUE or Z-SCORE
  • change stable_id to rppa or rppa_Zscores
  • add data_filename
Clinical Data
  • Create two separate meta files, one for samples and one for patients
  • Create two separate data files, one for samples and one for patients
    • remove the row describing whether an attribute is a SAMPLE or a PATIENT attribute
For full instructions, check the file formats
Case Lists -
Timeline Data Update meta file(s):
  • remove: stable_id, show_profile_in_analysis_tab, profile_name, profile_description
  • add: data_filename
Gistic Data Create the meta file
MutSig Data Create the meta file