You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We have a manuscript available that describes the design of the Generalized Data Model (GDM).
Below is the current version of the schema for the Generalized Data Model. We gratefully acknowledge the influence of the OHDSI community and the open-source OMOP common data model specifications on our thinking. In addition, we acknowledge the influence of both Sentinel and i2b2 on our approach, although most of our data model was designed prior to fully reviewing other data models. At the moment, many references to the concept table refer to the OMOP version 5 vocabulary table maintained by OHDSI. However, any internally consistent set of vocabularies with unique concept ids would be sufficient (e.g., the National Library of Medicine Metathesaurus).
Note that in April 2023, we removed the patient_details table from the data model.
facility_type_concept_id should be used to describe the whole facility (e.g., Academic Medical Center or Community Medical Center). Specific departments in the facility should be entered in the contexts table using the care_site_type_concept_id field.
column
type
description
foreign key (FK)
required
id
serial
Surrogate key for record
x
facility_name
text
Facility name, if available
primary_identifier
text
Primary facility identifier
x
primary_identifier_type
text
Type of identifier specified in primary identifier field (UPIN, NPI, etc)
x
secondary_identifier
text
Secondary facility identifier (Optional)
secondary_identifier_type
text
Type of identifier specified in secondary identifier field (UPIN, NPI, etc)
facility_type_concept_id
bigint
FK reference to concept table representing the facility type
For claims, records the claim level information (also referred to as "headers" in some databases)
Use claim from and thru date for start and end date, if available
Admit and discharge dates should go in the admission_details table unless those are the only dates for the records in which case they should be entered into both the collections and admission_details tables
Used to group clinical_codes typically occurring on the same day or at the same time (e.g., a diagnosis and a procedure, or a systolic and diastolic blood pressure)
contexts records are always linked to a collection record
care_site_type_concept_id is used to describe the department in which the service was performed
FK reference to concept table representing the file name (e.g MEDPAR). If data represents a subset of a file, concatenate the name of the file used and subset (e.g MEDPAR_SNF)
Stores clinical codes from all types of records including procedures, diagnoses, drugs, laboratory records and other sources.Some common vocabularies include ICD-9, ICD-10, SNOMED, Read, HCPCS, CPT, NDC, and LOINC
Ignores semantic distinctions about the type of information represented within a vocabulary because most vocabularies contain information from more than one domain
One record generated for each individual code in the raw data
Stores additional information related to measurements, observations, status, and specifications
Text-based vocabularies are sufficient, but could also be mapped to LOINC and stored in the mappings table(e.g., laboratory data indexed by text names for the lab results)
Other vocabularies should be included in their original system (e.g., oncology may be comprised of separate vocabularies for location, histology, grade, behavior, etc.)
This could be implemented by making variable names a vocabulary in themselves, depending on the use case
The purpose of this table is to capture all costs reported in the course of paying for services. It is designed from a US administrative claims data perspective.
All payer reimbursement records are linked to a record in the contexts table which identifies the type of reimbursement (generally a line-level or claim-level cost)
Note that claim-level reimbursements do not always sum to the individual line-level reimbursements, so caution should be used when querying records
The total amount charged by the provider of the good/service (e.g. hospital, physician pharmacy, dme provider) billed to a payer. This information is usually provided in claims data.
total_paid
float
The total amount paid from all payers for the expenses of the service/device/drug. This field is calculated using the following formula: paid_by_payer + paid_by_patient + paid_by_primary. In claims data, this field is considered the calculated field the payer expects the provider to get reimbursed for the service/device/drug from the payer and from the patient, based on the payer's contractual obligations.
paid_by_payer
float
The amount paid by the Payer for the service/device/drug. In claims data, generally there is one field representing the total payment from the payer for the service/device/drug. However, this field could be a calculated field if the source data provides separate payment information for the ingredient cost and the dispensing fee. If the paid_ingredient_cost or paid_dispensing_fee fields are populated with nonzero values, the paid_by_payer field is calculated using the following formula: paid_ingredient_cost + paid_dispensing_fee. If there is more than one Payer in the source data, several cost records indicate that fact. The Payer reporting this reimbursement should be indicated under the payer_plan_id field.
paid_by_patient
float
The total amount paid by the patient as a share of the expenses. This field is most often used in claims data to report the contracted amount the patient is responsible for reimbursing the provider for said service/device/drug. This is a calculated field using the following formula: paid_patient_copay + paid_patient_coinsurance + paid_patient_deductible. If the source data has actual patient payments (e.g. the patient payment is not a derivative of the payer claim and there is verification the patient paid an amount to the provider), then the patient payment should have it's own cost record with a payer_plan_id set to 0 to indicate the payer is actually the patient, and the actual patient payment should be noted under the total_paid field. The paid_by_patient field is only used for reporting a patient's responsibility reported on an insurance claim.
paid_patient_copay
float
The amount paid by the patient as a fixed contribution to the expenses. paid_patient_copay does contribute to the paid_by_patient variable. The paid_patient_copay field is only used for reporting a patient's copay amount reported on an insurance claim.
paid_patient_coinsurance
float
The amount paid by the patient as a joint assumption of risk. Typically, this is a percentage of the expenses defined by the Payer Plan after the patient's deductible is exceeded. paid_patient_coinsurance does contribute to the paid_by_patient variable. The paid_patient_coinsurance field is only used for reporting a patient's coinsurance amount reported on an insurance claim.
paid_patient_deductible
float
The amount paid by the patient that is counted toward the deductible defined by the Payer Plan. paid_patient_deductible does contribute to the paid_by_patient variable. The paid_patient_deductible field is only used for reporting a patient's deductible amount reported on an insurance claim.
paid_by_primary
float
The amount paid by a primary Payer through the coordination of benefits. paid_by_primary does contribute to the total_paid variable. The paid_by_primary field is only used for reporting a patient's primary insurance payment amount reported on the secondary payer insurance claim. If the source data has actual primary insurance payments (e.g. the primary insurance payment is not a derivative of the payer claim and there is verification another insurance company paid an amount to the provider), then the primary insurance payment should have it's own cost record with a payer_plan_id set to the applicable payer, and the actual primary insurance payment should be noted under the paid_by_payer field.
paid_ingredient_cost
float
The amount paid by the Payer to a pharmacy for the drug, excluding the amount paid for dispensing the drug. paid_ingredient_cost contributes to the paid_by_payer field if this field is populated with a nonzero value.
paid_dispensing_fee
float
The amount paid by the Payer to a pharmacy for dispensing a drug, excluding the amount paid for the drug ingredient. paid_dispensing_fee contributes to the paid_by_payer field if this field is populated with a nonzero value.
The contracted amount agreed between the payer and provider. This information is generally available in claims data. This is similar to the total_paid amount in that it shows what the payer expects the provider to be reimbursed after the payer and patient pay. This differs from the total_paid amount in that it is not a calculated field, but a field available directly in claims data. Use case: This will capture non-covered services. Non-covered services are indicated by an amount allowed and patient responsibility variables (copay, coinsurance, deductible) will be equal $0 in the source data. This means the patient is responsible for the total_charged value. The amount_allowed field is payer specific and the payer should be indicated by the payer_plan_id field.
Examples of things captured in this table are things like cost-to-charge ratio, calculated cost (for situations where the ETL process calculates a cost based on the available data), reported cost (where the ETL process imputes a cost from another source), and some other things that may become apparent with more use cases.
Defines the basis for the cost in the table (e.g., 2013 for a specific cost-to-charge ratio, or a specific cost from an external cost
x
value
float
Cost value
x
value_type_concept_id
bigint
FK reference to concept table to concept that defines the type of economic information in the value field (e.g., cost-to-charge ratio, calculated cost, reported cost)
Stores mortality information including date of death and cause(s) of death
Commonly populated from beneficiary or similar administrative data associated with the medical record
Deaths identified from diagnosis codes or discharge status are not necessary since such records are in the clinical_codes and admission_details tables and can be queried separately