This is the mapping file format:
STUDY_ID |
SITE_ID |
SUBJECT_ID |
SAMPLE_CD |
PLATFORM |
SAMPLE_TYPE |
TISSUE_TYPE |
TIME_POINT |
CATEGORY_CD |
SOURCE_CD |
---|---|---|---|---|---|---|---|---|---|
GSE8581 | GSE8581GSM210005 | GSM210005 | GPL570_BOGUS | Tumor | Lung | Week1 | Biomarker_Data+PLATFORM+TISSUETYPE | STD |
The first row is skipped. It must be present, otherwise the first assay will be ignored.
STUDY_ID
is required, but must match theSTUDY_ID
parameter.SITE_ID
is ignored.SUBJECT_ID
is the subject id. Must match the one provided in the clinical data set.SAMPLE_CD
is the name of the assay (here synonymous with "sample"). Required.PLATFORM
is the GPL id of the corresponding platform. Must be given; the platform must have already been loaded; must be the same for all rows. The value will be uppercased. It will be used to replace thePLATFORM
placeholder inCATEGORY_CD
.SAMPLE_TYPE
will be used to fillsample_type
inde_subject_sample_mapping
. It will also be used to replace the placeholderSAMPLETYPE
inCATEGORY_CD
. Optional.TISSUE_TYPE
will be used to filltissue_type
. It will also be used to replace the placeholderTISSUETYPE
andATTR1
(legacy) inCATEGORY_CD
.TIME_POINT
will be used to filltimepoint
. It will be used to replace theTIMEPOINT
andATTR2
(legacy) placeholder inCATEGORY_CD
. Optional.CATGEORY_CD
will be used to form the concept path for the node to be created. Components of the path are separated with+
. It can include several placeholders (see the descriptions of the other columns). In principle it can differ among the several assays, but that code path has never been tested.SOURCE_CD
is ignored (must be present as a last column).