Generalized Data Model (GDM)

We have a manuscript available that describes the design of the Generalized Data Model (GDM).

Below is the current version of the schema for the Generalized Data Model. We gratefully acknowledge the influence of the OHDSI community and the open-source OMOP common data model specifications on our thinking. In addition, we acknowledge the influence of both Sentinel and i2b2 on our approach, although most of our data model was designed prior to fully reviewing other data models. At the moment, many references to the concept table refer to the OMOP version 5 vocabulary table maintained by OHDSI. However, any internally consistent set of vocabularies with unique concept ids would be sufficient (e.g., the National Library of Medicine Metathesaurus).

Note that in April 2023, we removed the patient_details table from the data model.

GDM Tables

patients

Demographic information about the patients in the data
The column for practitioner_id is intended for situations where there is a defined primary care practitioner (e.g., HMO or CPRD data)

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record		x
gender_concept_id	bigint	FK reference to the concept table for the unique gender of the patient	concept
birth_date	date	Date of birth (yyyy-mm-dd)
race_concept_id	bigint	FK reference to the concept table for the unique race of the patient	concept
ethnicity_concept_id	bigint	FK reference to the concept table for the ethnicity of the patient	concept
address_id	bigint	FK reference to the place of residency for the patient in the location table, where the detailed address information is stored	addresses
practitioner_id	bigint	FK reference to the primary care practitioner the patient is seeing in the practitioners table	practitioners
patient_id_source_value	text	Originial patient identifier defined in the source data		x

practitioners

All non-facility practitioners (i.e., physicians, etc.) are listed

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record		x
practitioner_name	text	Practitioner's name, if available
primary_identifier	text	Primary practitioner identifier		x
primary_identifier_type	text	Type of identifier specified in primary identifier field (UPIN, NPI, etc)		x
secondary_identifier	text	Secondary practitioner identifier (Optional)
secondary_identifier_type	text	Type of identifier specified in secondary identifier field (UPIN, NPI, etc)
specialty_concept_id	bigint	FK reference to an identifier in the concept table for specialty	concept
address_id	bigint	FK reference to the address of the location where the practitioner is practicing	addresses
birth_date	date	Date of birth (yyyy-mm-dd)
gender_concept_id	bigint	FK reference to an identifier in the concept table for the unique gender of the practitioner	concept

facilities

Unique records for all the facilities in the data
facility_type_concept_id should be used to describe the whole facility (e.g., Academic Medical Center or Community Medical Center). Specific departments in the facility should be entered in the contexts table using the care_site_type_concept_id field.

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record		x
facility_name	text	Facility name, if available
primary_identifier	text	Primary facility identifier		x
primary_identifier_type	text	Type of identifier specified in primary identifier field (UPIN, NPI, etc)		x
secondary_identifier	text	Secondary facility identifier (Optional)
secondary_identifier_type	text	Type of identifier specified in secondary identifier field (UPIN, NPI, etc)
facility_type_concept_id	bigint	FK reference to concept table representing the facility type	concept
specialty_concept_id	bigint	A foreign key to an identifier in the concept table for specialty	concept
address_id	bigint	A foreign key to the address of the location of the facility	addresses

collections

Used to group contexts records
For claims, records the claim level information (also referred to as "headers" in some databases)
- Use claim from and thru date for start and end date, if available
- Admit and discharge dates should go in the admission_details table unless those are the only dates for the records in which case they should be entered into both the collections and admission_details tables
For EHR, records the visit level information

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record		x
patient_id	bigint	FK to reference to patients table	patients	x
start_date	date	Start date of record (yyyy-mm-dd)		x
end_date	date	End date of record (yyyy-mm-dd)		x
duration	float	Duration of collection. (e.g. hospitalization length of stay)
duration_unit_concept_id	bigint	FK reference to concept table representing the unit of duration (hours, days, weeks etc.)	concepts
facility_id	bigint	FK reference to facilities table	facilities
admission_detail_id	bigint	FK reference to admission_details table	admission_details
collection_type_concept_id	bigint	FK reference to concept table representing the type of collection this record represents	concept

contexts_practitioners

Links one or more practitioners with a contexts record
Each record represents an encounter between a patient and a practitioner on a specific context
Captures the role, if any, the practitioner played on the context (e.g., attending physician)

column	type	description	foreign key (FK)	required
context_id	bigint	FK reference to contexts table	contexts	x
practitioner_id	bigint	FK reference to practitioners table	practitioners	x
patient_id	bigint	FK reference to patients table	patients	x
role_type_concept_id	bigint	FK reference to the concept table representing roles practitioners can play in an encounter	concept
specialty_type_concept_id	bigint	FK reference to concept table representing the practitioner's specialty type for the services/diagnoses associated with this record	concept

contexts

Stores information about the context of the clinical_codes and payer_reimbursements
Used to group clinical_codes typically occurring on the same day or at the same time (e.g., a diagnosis and a procedure, or a systolic and diastolic blood pressure)
contexts records are always linked to a collection record
care_site_type_concept_id is used to describe the department in which the service was performed

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record		x
collection_id	bigint	FK reference to collections table	collections	x
patient_id	bigint	FK to reference to patients table	patients	x
start_date	date	Start date of record (yyyy-mm-dd)		x
end_date	date	End date of record (yyyy-mm-dd)
facility_id	bigint	FK reference to facilities table	facilities
care_site_type_concept_id	bigint	FK reference to concept table representing the care site type within the facility	concept
pos_concept_id	bigint	FK reference to concept table representing the place of service associated with this record	concept
source_type_concept_id	bigint	FK reference to concept table representing the file name (e.g MEDPAR). If data represents a subset of a file, concatenate the name of the file used and subset (e.g MEDPAR_SNF)	concept	x
service_specialty_type_concept_id	bigint	FK reference to concept table representing the specialty type for the services/diagnoses associated with this record	concept
record_type_concept_id	bigint	FK reference to concept table representing the type of contexts the record represents (line, claim, etc.)	concept	x

clinical_codes

Stores clinical codes from all types of records including procedures, diagnoses, drugs, laboratory records and other sources.Some common vocabularies include ICD-9, ICD-10, SNOMED, Read, HCPCS, CPT, NDC, and LOINC
Ignores semantic distinctions about the type of information represented within a vocabulary because most vocabularies contain information from more than one domain
One record generated for each individual code in the raw data
Extra detail can be found about a code in the measurement_details and drug_exposure_details tables if that information exists

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record		x
collection_id	bigint	FK reference to collections table	collections	x
context_id	bigint	FK reference to contexts table	contexts	x
patient_id	bigint	FK reference to patients table	patients	x
start_date	date	Start date of record (yyyy-mm-dd)		x
end_date	date	End date of record (yyyy-mm-dd)		x
clinical_code_concept_id	bigint	FK reference to concept table for the code assigned to the record	concept	x
quantity	bigint	Quantity, if available (e.g., procedures)
seq_num	int	The sequence number for the variable assigned (e.g. dx3 gets sequence number 3)
provenance_concept_id	bigint	Additional type information (ex: primary, admitting, problem list, etc)	concept
clinical_code_source_value	text	Source code from raw data		x
clinical_code_vocabulary_id	text	FK reference to the vocabulary the clinical code comes from	vocabulary	x
measurement_detail_id	bigint	FK reference to measurement_details table	measurement_details
drug_exposure_detail_id	bigint	FK reference to drug_exposure_details table	drug_exposure_details

measurement_details

Stores additional information related to measurements, observations, status, and specifications
Text-based vocabularies are sufficient, but could also be mapped to LOINC and stored in the mappings table(e.g., laboratory data indexed by text names for the lab results)
Other vocabularies should be included in their original system (e.g., oncology may be comprised of separate vocabularies for location, histology, grade, behavior, etc.)
- This could be implemented by making variable names a vocabulary in themselves, depending on the use case

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record		x
patient_id	bigint	FK reference to patients table	patients	x
result_as_number	float	The observation result stored as a number, applicable to observations where the result is expressed as a numeric value
result_as_string	text	The observation result stored as a string, applicable to observations where the result is expressed as verbatim text
result_as_concept_id	bigint	FK reference to concept table for the result associated with the detail_concept_id (e.g., positive/negative, present/absent, low/high, etc.)	concept
result_modifier_concept_id	bigint	FK reference to concept table for result modifier (=, <, >, etc.)	concept
unit_concept_id	bigint	FK reference to concept table for the measurement units (e.g., mmol/L, mg/dL, etc.)	concept
normal_range_low	float	Lower bound of the normal reference range assigned by the laboratory
normal_range_high	float	Upper bound of the normal reference range assigned by the laboratory
normal_range_low_modifier_concept_id	bigint	FK reference to concept table for result modifier (=, <, >, etc.)	concept
normal_range_high_modifier_concept_id	bigint	FK reference to concept table for result modifier (=, <, >, etc.)	concept

drug_exposure_details

Designed to capture extra details about drug-specific clinical_codes
The quantity of a drug is stored in the clinical_codes quantity field

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record		x
patient_id	bigint	FK to reference to patients table	patients	x
refills	int	The number of refills after the initial prescription; the initial prescription is not counted (i.e., values start with 0)
days_supply	int	The number of days of supply as recorded in the original prescription or dispensing record
number_per_day	float	The number of pills taken per day
dose_form_concept_id	bigint	FK reference to concept table for the form of the drug (capsule, injection, etc.)	concept
dose_unit_concept_id	bigint	FK reference to concept table for the units in which the dose_value is expressed	concept
route_concept_id	bigint	FK reference to concept table for route in which drug is given	concept
dose_value	float	Numeric value for the dose of the drug
strength_source_value	text	Drug strength as reported in the raw data. This can include both dose value and units
ingredient_source_value	text	Ingredient/Generic name of drug as reported in the raw data
drug_name_source_value	text	Product/Brand name of drug as reported in the raw data

payer_reimbursements

The purpose of this table is to capture all costs reported in the course of paying for services. It is designed from a US administrative claims data perspective.
All payer reimbursement records are linked to a record in the contexts table which identifies the type of reimbursement (generally a line-level or claim-level cost)
Note that claim-level reimbursements do not always sum to the individual line-level reimbursements, so caution should be used when querying records

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record
context_id	bigint	FK reference to context table	contexts	x
patient_id	bigint	FK to reference to patients table	patients	x
clinical_code_id	bigint	FK reference to clinical_codes table to be used if a specific code is the direct cause for the reimbursement	clinical_codes
currency_concept_id	bigint	FK reference to concept table for the 3-letter code used to delineate international currencies (e.g., USD = US Dollar)	concept	x
total_charged	float	The total amount charged by the provider of the good/service (e.g. hospital, physician pharmacy, dme provider) billed to a payer. This information is usually provided in claims data.
total_paid	float	The total amount paid from all payers for the expenses of the service/device/drug. This field is calculated using the following formula: paid_by_payer + paid_by_patient + paid_by_primary. In claims data, this field is considered the calculated field the payer expects the provider to get reimbursed for the service/device/drug from the payer and from the patient, based on the payer's contractual obligations.
paid_by_payer	float	The amount paid by the Payer for the service/device/drug. In claims data, generally there is one field representing the total payment from the payer for the service/device/drug. However, this field could be a calculated field if the source data provides separate payment information for the ingredient cost and the dispensing fee. If the paid_ingredient_cost or paid_dispensing_fee fields are populated with nonzero values, the paid_by_payer field is calculated using the following formula: paid_ingredient_cost + paid_dispensing_fee. If there is more than one Payer in the source data, several cost records indicate that fact. The Payer reporting this reimbursement should be indicated under the payer_plan_id field.
paid_by_patient	float	The total amount paid by the patient as a share of the expenses. This field is most often used in claims data to report the contracted amount the patient is responsible for reimbursing the provider for said service/device/drug. This is a calculated field using the following formula: paid_patient_copay + paid_patient_coinsurance + paid_patient_deductible. If the source data has actual patient payments (e.g. the patient payment is not a derivative of the payer claim and there is verification the patient paid an amount to the provider), then the patient payment should have it's own cost record with a payer_plan_id set to 0 to indicate the payer is actually the patient, and the actual patient payment should be noted under the total_paid field. The paid_by_patient field is only used for reporting a patient's responsibility reported on an insurance claim.
paid_patient_copay	float	The amount paid by the patient as a fixed contribution to the expenses. paid_patient_copay does contribute to the paid_by_patient variable. The paid_patient_copay field is only used for reporting a patient's copay amount reported on an insurance claim.
paid_patient_coinsurance	float	The amount paid by the patient as a joint assumption of risk. Typically, this is a percentage of the expenses defined by the Payer Plan after the patient's deductible is exceeded. paid_patient_coinsurance does contribute to the paid_by_patient variable. The paid_patient_coinsurance field is only used for reporting a patient's coinsurance amount reported on an insurance claim.
paid_patient_deductible	float	The amount paid by the patient that is counted toward the deductible defined by the Payer Plan. paid_patient_deductible does contribute to the paid_by_patient variable. The paid_patient_deductible field is only used for reporting a patient's deductible amount reported on an insurance claim.
paid_by_primary	float	The amount paid by a primary Payer through the coordination of benefits. paid_by_primary does contribute to the total_paid variable. The paid_by_primary field is only used for reporting a patient's primary insurance payment amount reported on the secondary payer insurance claim. If the source data has actual primary insurance payments (e.g. the primary insurance payment is not a derivative of the payer claim and there is verification another insurance company paid an amount to the provider), then the primary insurance payment should have it's own cost record with a payer_plan_id set to the applicable payer, and the actual primary insurance payment should be noted under the paid_by_payer field.
paid_ingredient_cost	float	The amount paid by the Payer to a pharmacy for the drug, excluding the amount paid for dispensing the drug. paid_ingredient_cost contributes to the paid_by_payer field if this field is populated with a nonzero value.
paid_dispensing_fee	float	The amount paid by the Payer to a pharmacy for dispensing a drug, excluding the amount paid for the drug ingredient. paid_dispensing_fee contributes to the paid_by_payer field if this field is populated with a nonzero value.
information_period_id	bigint	FK reference to the information_periods table
amount_allowed	float	The contracted amount agreed between the payer and provider. This information is generally available in claims data. This is similar to the total_paid amount in that it shows what the payer expects the provider to be reimbursed after the payer and patient pay. This differs from the total_paid amount in that it is not a calculated field, but a field available directly in claims data. Use case: This will capture non-covered services. Non-covered services are indicated by an amount allowed and patient responsibility variables (copay, coinsurance, deductible) will be equal $0 in the source data. This means the patient is responsible for the total_charged value. The amount_allowed field is payer specific and the payer should be indicated by the payer_plan_id field.

costs

Used to capture all non reimbursement costs
Examples of things captured in this table are things like cost-to-charge ratio, calculated cost (for situations where the ETL process calculates a cost based on the available data), reported cost (where the ETL process imputes a cost from another source), and some other things that may become apparent with more use cases.

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record		x
context_id	bigint	FK reference to context table	contexts	x
patient_id	bigint	FK reference to patients table	patients	x
clinical_code_id	bigint	FK reference to clinical_codes table to be used if a specific code is the direct cause for the reimbursement	clinical_codes
currency_concept_id	bigint	FK reference to concept table for the 3-letter code used to delineate international currencies (e.g., USD = US Dollar)	concept	x
cost_base	text	Defines the basis for the cost in the table (e.g., 2013 for a specific cost-to-charge ratio, or a specific cost from an external cost		x
value	float	Cost value		x
value_type_concept_id	bigint	FK reference to concept table to concept that defines the type of economic information in the value field (e.g., cost-to-charge ratio, calculated cost, reported cost)	concept	x

addresses

Used to store location information for patients, practitioners, and facilities
One record for each geographic location in the data

column	type	description	required
id	serial	Surrogate key for record	x
address_1	text	Typically used for street address
address_2	text	Typically used for additional detail such as building, suite, floor, etc.
city	text	The city field as it appears in the source data
state	text	The state field as it appears in the source data
zip	text	The zip or postal code
county	text	The county, if available
census_tract	text	The census tract if available
hsa	text	The Health Service Area, if available (originally defined by the National Center for Health Statistics)
country	text	The country if necessary

deaths

Stores mortality information including date of death and cause(s) of death
Commonly populated from beneficiary or similar administrative data associated with the medical record
Deaths identified from diagnosis codes or discharge status are not necessary since such records are in the clinical_codes and admission_details tables and can be queried separately

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record		x
patient_id	bigint	FK reference to patients table	patients	x
date	date	Date of death (yyyy-mm-dd)		x
cause_concept_id	bigint	FK reference to concept table for cause of death (typically ICD-9 or ICD-10 code)	concept
cause_type_concept_id	bigint	FK reference to concept table for the type of cause of death (e.g. primary, secondary, etc. )	concept
practitioner_id	bigint	FK reference to practitioners table	practitioners

information_periods

Captures periods for which information in each table is relevant for each person
Could include enrollment types (e.g., Part A, Part B, HMO) or just "observable" (as with up-to-standard data in CPRD)
One row per patient per non-overlapping enrollment/information period type

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record		x
patient_id	bigint	FK reference to patients table	patients	x
start_date	date	Start date of record (yyyy-mm-dd)		x
end_date	date	End date of record (yyyy-mm-dd)		x
information_type_concept_id	bigint	FK reference to concept table representing the information type (e.g., insurance coverage, hospital data, up-to-standard date)	concept	x

admission_details

Captures details about admissions and emergency department encounters that cannot be stored in the clinical_codes, contexts, or collections tables
One row per admission
Each admission record in the collections table will link to this table

column	type	description	foreign key (FK)	required
id	serial	Surrogate key for record		x
patient_id	bigint	FK reference to patients table	patients	x
admission_date	date	Date of admission (yyyy-mm-dd)		x
discharge_date	date	Date of discharge (yyyy-mm-dd)		x
admit_source_concept_id	bigint	Database specific code indicating source of admission (e.g., ER visit, transfer, etc.)	concept
discharge_location_concept_id	bigint	Database specific code indicating discharge location (e.g., death, home, transfer, long-term care, etc.)	concept
admission_type_concept_id	bigint	FK reference to concept table representing the type of admission the record is (Emergency, Elective, etc.)	concept

etl_info

Basic attribute value table for storing information about the ETL
Currently used to store all arguments used during a setlr run

column	type	description	foreign key (FK)	required
key	text	Name for the value		x
value	text	Value

etl_information_types

Lists all the information types present in information_periods.information_type_concept_id
Gives a sense of what vocabularies are used in the dataset and how common they are

column	type	description	foreign key (FK)	required
information_type	text	Information type		x
n_rows	bigint	Number of occurrences in information_periods		x

etl_tables

Lists all tables ETL'd for the dataset along with row counts, patient counts, and min/max dates
Gives a sense of what tables are present in the dataset and how large each is

column	type	description	required
table_name	text	Name of table	x
n_rows	bigint	Number of rows in table	x
n_patients	bigint	Number of unique patients table
earliest_date	date	Earliest date found in table
latest_date	date	Latest date found in table

etl_vocabulary_ids

Lists all the vocabularies present in clinical_codes.clinical_code_vocabulary_id
Gives a sense of what vocabularies are used in the dataset and how common they are

column	type	description	foreign key (FK)	required
vocabulary_id	text	Vocabulary ID present	vocabulary	x
n_rows	bigint	Number of occurrences in clinical_codes		x

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Generalized Data Model (GDM)

GDM Tables

patients

practitioners

facilities

collections

contexts_practitioners

contexts

clinical_codes

measurement_details

drug_exposure_details

payer_reimbursements

costs

addresses

deaths

information_periods

admission_details

etl_info

etl_information_types

etl_tables

etl_vocabulary_ids

Files

README.md

Latest commit

History

README.md

File metadata and controls

Generalized Data Model (GDM)

GDM Tables