layout | title | nav_order |
---|---|---|
default |
Data Merging |
8 |
When OMOP records are recorded some records must be merged. An example of this would be if we detected patient is more than one data source.
In some cases we may partially record a record with the hope of detecting the missing information in later transformations. For example if we have a person data source that does not define the patient's gender.
If all data is transformed and incomplete records still exist, use the [prune command
]({% link docs/user-guide/commands.md %}#prune-command).
Column | Merging rule |
---|---|
gender_concept_id | If new value is not null, set it |
year_of_birth | If new value is not null, set it |
month_of_birth | If new value is not null, set it |
day_of_birth | If new value is not null, set it |
birth_datetime | If new value is not null, set it |
race_concept_id | If new value is not null, set it |
ethnicity_concept_id | If new value is not null, set it |
location_id | If new value is not null, set it |
provider_id | If new value is not null, set it |
care_site_id | If new value is not null, set it |
person_source_value | If new value is not null, set it |
gender_source_value | If new value is not null, set it |
gender_source_concept_id | If new value is not null, set it |
race_source_value | If new value is not null, set it |
race_source_concept_id | If new value is not null, set it |
ethnicity_source_value | If new value is not null, set it |
ethnicity_source_concept_id | If new value is not null, set it |
- If two records are related to the same person, the fields related to that person such as gender are populated based on last record wins. For example if one record says gender is Male and the next record says gender is Female, we overwrite the gender to be Female.
- However if the second value says gender is null, we would not overwrite the gender.
- The order in which data is loaded will define what the 'final' value of fields like gender.
Merging logic is not needed as we only record unique locations. If the location already exists in the location
table, do not record it. We never update records in this table.
Merging logic depends upon the data source.
Data source | Logic |
---|---|
CDS | Record the record if it does not already exist using the following columns as keys person_id , RecordConnectionIdentifier and condition_concept_id . |
COSD | Record the record if it does not already exist using the following columns as keys person_id , condition_concept_id and condition_start_date . |
Merging logic is not needed. If a patient was recorded to have died then no futher records will be recorded.
Merging logic depends upon the data source.
Data source | Logic |
---|---|
CDS | Record the record if it does not already exist using the following columns as keys RecordConnectionIdentifier and drug_concept_id . |
Merging logic depends upon the data source.
Data source | Logic |
---|---|
CDS | Record the record if it does not already exist using the following columns as keys RecordConnectionIdentifier , HospitalProviderSpellNumber , observation_date and observation_concept_id . |
Merging logic depends upon the data source.
Data source | Logic |
---|---|
CDS | Record the record if it does not already exist using the following columns as keys RecordConnectionIdentifier , procedure_date and procedure_concept_id . |
COSD | Record the record if it does not already exist using the following columns as keys procedure_date and procedure_concept_id . |
Merging logic depends upon the data source.
Data source | Logic |
---|---|
CDS | Record the record if it does not already exist using the following columns as keys RecordConnectionIdentifier , HospitalProviderSpellNumber and person_id . |
Merging logic depends upon the data source.
Data source | Logic |
---|---|
CDS | Record the record if it does not already exist using the following columns as keys RecordConnectionIdentifier , HospitalProviderSpellNumber and person_id . |
Merging logic depends upon the data source.
Data source | Logic |
---|---|
COSD | Record the record if it does not already exist using the following columns as keys measurement_date , measurement_concept_id , measurement_source_concept_id and person_id . |