-
Notifications
You must be signed in to change notification settings - Fork 12
Define clinical phenotypes
Health outcome information captured by different data sources/data fields is encoded differently. These relationships have been curated and recorded in the data settings file included in the ukbpheno package. For a target phenotype, we need to identify all relevant diagnosis/operation codes by surveying the various data sources/ data fields on the Showcase.
For example, participants with the following codes are likely to suffer from coronary artery disease:
Variable | ICD-9 | ICD-10 | OPCS-4 | Self-reported fields | READ2 | CTV3 |
---|---|---|---|---|---|---|
Coronary artery disease | 414, 410, 412 | I24, I25, Z955, I21, I22, I23 | K40, K41, K42, K43, K44, K45, K46, K49, K50, K75 | 20002(1075), 20004(1070, 1095, 1523),6150(1) | G34y1, G34.., G3..., ZV45L, G34z0, ZV458, 793G., 79280, 79281, 79282, 7928y, 7928z, 79292, 7929y, 7929z, 792.., 7A547, 793Gy,793Gz,79283 | G34y1, XE0WG, XE2uV, XaC1g, XaG1Q, XaQiY, ZV458, G34.., X200b, Xa1dP, XaLgU, 79280, 79281, 79282, 7928y, 7928z, 79292, 7929y, 7929z, X00tT, X013N, XE0Em, XaLgZ, XaLga, XaMKE |
Fill these codes in a definition table (prefilled_template) which will be read by the ukbpheno package.
Fill in one phenotype (such as Cad) per row. The column “TRAIT” contains the unique identifier which is case sensitive. Fill in the codes in the corresponding coding systems. For example in the "ICD10" column for CAD:
For the “TS” (touchscreen) column
-
Fill in field number as Showcase followed by the condition e.g. “6177=3(insulin)”
-
The corresponding age of diagnosis can be added with “[]” following the condition e.g. “4041=1[2976](Gestational diabetes)”
-
Conditions symbols accepted:
Condition symbol Meaning = Equal to (value) != Not equal to < Smaller than <= OR ≤ Smaller than or equal to > Larger than >= OR ≥ Larger than or equal to
A shiny app to cross-reference codes between systems using the all_lkps_maps_v*.xlsx provided by UK Biobank is included in the package. All other required coding files are included in the package (/inst/extdata/)
Rscript shiny.lookup_codes.R --help
Rscript shiny.lookup_codes.R --fcoding_xls path_to/all_lkps_maps_v3.xlsx \
--f_med_readSR path_to/dfCodesheetREAD_SR.Coding.RData \
--fcoding_icd10 path_to/ICD10.coding19.tsv \
--fcoding_icd9 path_to/ICD9.coding87.tsv \
--fcoding_opcs4 path_to/OPCS4.coding240.tsv \
--fcoding_20003 path_to/20003.coding4.tsv
Composite phenotype is a phenotype that includes/excludes other phenotypes. For example, a composite phenotype “diabetes mellitus” may include two phenotypes “type 1 diabetes” and “type 2 diabetes”. The following 4 columns are used to construct composite phenotype:
TRAIT | Exclude_from_cases | Study_population | Exclude_from_controls | Include_definitions |
---|---|---|---|---|
Dm | DmRx | DmT1, DmT2 |
“Study_population” can be used to restrict definition on a subgroup of participants with specific phenotype. Participants with phenotypes in “Include_definition” will be considered to be a case for the composite phenotype. Exclude_from_cases” and “Exclude_from_controls” exclude participants with certain phenotype(s) from cases and controls respectively.