Skip to content
This repository was archived by the owner on Aug 11, 2021. It is now read-only.

Server Imports

Jonathan Payne edited this page Jul 13, 2020 · 1 revision

DEPRECATED

While this functionality is still supported, you use at your own risk! This method of importing content will be replaced in the future by the bulk import functionality, which has the advantage of also being available via the API.

Overview

The Server-side Import Scripts are setup for two use cases:

  • Import file contains only the concepts or mappings to update, retire and add to a source in OCL.
  • Import file represents the entirety of the concepts or mappings in a source. Resources will be updated, retired, and added according to each resource's attributes in the import file. Concepts that exist in OCL but that are not present in the import file will be retired.

The image below illustrates the actions that the import management commands perform for each concept in the import file and OCL source.

Concepts referenced in internal mappings must already exist in OCL, otherwise an exception will be raised during the import and the mapping will be skipped. For that reason, you should generally import all concepts prior to importing mappings.

Concept Import

Usage: ./manage.py import_concepts_to_source
     --source [source_id]
     --token [user_auth_token]
     --create-source-version
     --retire-missing-records
     --error-output-file [validation_error_output_file]
     [concepts input file]

source_id is the mongo object id of the data source that is the target of this import. Use the debug option on the web to locate the uuid or request the object from the API. For example, record the uuid field in the following request:

GET /orgs/WHO/sources/ICD-10/
{
    "type": "Source",
    "uuid": "55804800525c311e7eb3856b",
    "id": "ICD-10-WHO",
    "short_code": "ICD-10-WHO",
    "name": "ICD-10-WHO",
    "full_name": "International Classification for Diseases version 10, WHO",
    "description": "ICD-10 WHO Version",
    "source_type": "External",
    "public_access": "View",
    "default_locale": "en",
    "supported_locales": ["en"],
    "website": "",
    "url": "/orgs/WHO/sources/ICD-10-WHO/",
    "versions_url": "/orgs/WHO/sources/ICD-10-WHO/versions/",
    "concepts_url": "/orgs/WHO/sources/ICD-10-WHO/concepts/",
    "active_concepts": 0,
    "owner": "WHO",
    "owner_type": "Organization",
    "owner_url": "/orgs/WHO/",
    "versions": 1,
    "created_on": "2015-06-16T12:00:00.667",
    "updated_on": "2015-06-16T12:00:00.667",
    "created_by": "WHO",
    "updated_by": "sysadmin",
    "extras": null,
    "external_id": "3f65bd34-26fe-102b-80cb-0017a47871b2"
}

user_auth_token is the API auth token for a user who has permission to edit this source.

create-source-version (optional) if specified, the script will create a new version of the specified source. Existing concepts that are still active will be promoted to the new source version, and any new concept versions will originate in the new source version. This is in contrast to the default behavior, which is to create new concept versions (if necessary) in the current latest version of the source.

retire-missing-records instructs the import script to retire any concepts in the source in OCL if they are not part of the import file. This flag is designed to simplify importing full dictionary imports. For example, if importing a JSON file of an updated version of a source, this flag assumes that any concepts that are not included in the new import are no longer part of the source and should be retired accordingly.

validation_error_output_file dictates the name of the csv file which import script will redirect validation errors to. It is optional, which will set the output file name to bulk_import_validation_errors_YYYYMMDDHHMMSS.csv

Helpful Command

This helpful httpie commands can get you to the source_id for a source quickly, followed by the import

# setup environment variables
source ~/shell_prep.sh
# get source version ID for CIEL
http GET $OCL_API_HOST/orgs/CIEL/sources/CIEL/ Authorization:"Token $OCL_API_TOKEN" --pretty format | grep uuid
# now call import
ocl/manage.py import_concepts_to_source --source [UUID from above] --token $OCL_API_TOKEN [source json file]
Input File

We have a tool to create the input file from the MySQL database from OMRS. See the ocl_omrs project documentation.

The input file should be a valid JSON Lines file. Each line should be a valid JSON representation of the input expected by:

POST /[owner type]/[owner ID]/sources/[source ID]/concepts

The spec is copied here for convenience:

  • Input
    • id (required) string - unique identifier for concept, e.g. 145939019, A15.1, etc.
    • concept_class (required) string - classification of the concept (e.g. Diagnosis, Procedure, etc.)
    • datatype (optional) string - datatype for the concept (e.g. Numeric, String, Coded)
    • names (required) list - at least one name is required
      • name (required) string -
      • locale (required) string - 2-character language code, e.g. "en", "es"
      • locale_preferred (optional) - "true" or "false", default is "false",
      • name_type (optional) - additional name descriptor, such as those used in SNOMED CT
    • descriptions (optional) list
      • description (required)
      • locale (required)
      • locale_preferred (optional) - default is "false"
      • description_type (optional) - additional descriptor, such as those used in SNOMED CT
    • extras (optional) JSON dictionary - additional metadata for the resource
  • Notes
    • At least one name must be submitted when the concept is created in order to set the "display_name" and "display_locale" fields
    • "descriptions", "extras", and additional "names" may be submitted with the concept details, in addition to being created after the fact by posting to the sub-resource (e.g. POST /.../concepts/12845003/names/)

Example

  • POST /orgs/IHTSDO/sources/SNOMED-CT/concepts/
{
    "id": "12845003",
    "concept_class": "Laboratory Procedure",
    "datatype": "N/A",
    "names": [
        {
            "type": "ConceptName",
            "uuid": "akdiejf93jf939f9",
            "name": "Malaria smear",
            "locale": "en",
            "locale_preferred": "true",
            "name_type": "Designated Preferred Name"
        },
        {
            "type": "ConceptName",
            "uuid": "akdiejf93jf939f9",
            "name": "Malaria smear (procedure)",
            "locale": "en",
            "name_type": "Full Form of Descriptor"
        }
    ],

    "extras": {
        "UMLS_CUI": "C0200703",
        "ISPRIMITIVE": "1"
    }
}
Conditions that trigger a new concept version
  • A concept is imported that did not exist in the previous source version.
  • A previously existing concept is deactivated in the new source version.
  • The imported version of a concept does not match the concept in the latest version of the source.

In order to be considered a "match", all of these fields must be identical:

  • public_access
  • concept_class
  • datatype
  • names
  • descriptions
  • extras

Mapping Import

The mapping import management command has the same options as the concept import script:

Usage: ./manage.py import_mappings_to_source
     --source [source_id]
     --token [user_auth_token]
     --create-source-version
     [mappings input file]

source_id is the mongo object id of the data source that these concepts below. Use the debug option on the web to locate the uuid.

user_auth_token is the API auth token for a user who has permission to edit this source.

create-source-version (optional) if specified the script will create a new version of the specified source. See documentation for concept import above for more details.

retire-missing-records instructs the import script to retire any mappings in the source in OCL if they are not part of the import file. This flag is designed to simplify importing full dictionary imports. For example, if importing a JSON file of an updated version of a source, this flag assumes that any mappings that are not included in the new import are no longer part of the source and should be retired accordingly.

Input File

The input file should be a valid JSON Lines file. Each line should be a valid JSON representation of the input expected by:

POST /[owner type]/[owner ID]/sources/[source ID]/mappings

Implementation Details

Mappings import is accomplished via a Django management command: import_mappings_to_source, which is defined here:

https://github.com/OpenConceptLab/oclapi/blob/master/django-nonrel/ocl/oclapi/management/commands/import_mappings_to_source.py

It simply calls out to the MappingsImporter defined here:

https://github.com/OpenConceptLab/oclapi/blob/master/django-nonrel/ocl/mappings/importer.py

MappingsImporter

The entry point to the MappingsImporter is import_mappings. Before looping over the mappings in the input file, import_mappings does some setup work:

  1. Get the latest version of the source.
  2. Create a new source version (subsequent to latest), if requested.
  3. Read in all the mapping IDs (if any) that exist on the latest source version.

Then it loops over the lines in the import file, calling handle_mapping on each line.

handle_mapping

This method determines whether the mapping is an insert or an update, and takes action accordingly. Specifically, it:

  1. Creates a serializer to deserialize the mapping.
  2. Raises an exception if unable to deserialize.
  3. Deserializes the mapping in memory.
  4. Determines if the mapping is an insert or an update:
            query = Q(parent_id=self.source.id, map_type=mapping.map_type, from_concept=mapping.from_concept)
            if mapping.to_concept:
                query = query & Q(to_concept=mapping.to_concept)
            else:
                query = query & Q(to_source_id=mapping.to_source.id, to_concept_code=mapping.to_concept_code, to_concept_name=mapping.to_concept_name)
            mapping = Mapping.objects.get(query)

This odd bit of logic is constructing a Django queryset based the type of mapping we're dealing with:

  • Always start with the source ID, map type & from_concept
  • If it's an internal mapping, include the to_concept
  • If it's an external mapping, include the to_source, to_concept_code & to_concept_name

If Mapping.objects.get(query) raises DoesNotExist, then we treat this as an insert operation (i.e. call add_mapping). If it does exist, we make sure its ID is included in the list we set aside earlier, and call update_mapping.

add_mapping

This operation merely deserializes the mapping, and persists the result.

update_mapping

This operation computes a diff between the new mapping and the existing one, and applies any updates it finds. The diff operation is defined here:

https://github.com/OpenConceptLab/oclapi/blob/master/django-nonrel/ocl/mappings/models.py#L260

NOTE: I'm not sure why this method treats retired as a special case. This might be an artifact of the concept import implementation, and is probably worth looking into.

The last thing we do here before returning is remove the mapping ID from the list we set aside earlier (i.e. mark it as "seen"). Anything left over in this list upon completion of the main loop is considered "unseen" and will be deactivated. That's what's happening in the loop that starts here:

https://github.com/OpenConceptLab/oclapi/blob/master/django-nonrel/ocl/mappings/importer.py#L69

How to import

Using SSH and running import script directly on server

  • Login to machine with root credentials
  • cd ~/oclapi/ocl
  • Now you are ready to run one of the import commands -- note that there are additional docker commands that may be useful below

Import with indexing in the end

  • Import concepts
docker-compose run -d --name ocl_import ocl_api bash -c "python manage.py import_concepts_to_source --source <source unique id from mongo db> --token <root_auth_token> <json_file_name> && echo y | python manage.py update_index"
  • Import mappings
docker-compose run -d --name ocl_import ocl_api bash -c "python manage.py import_mappings_to_source --source <source unique id from mongo db> --token <root_auth_token> <json_file_name> && echo y | python manage.py update_index"

Import with inline indexing

  • Import concepts:
docker-compose run -d --name ocl_import ocl_api bash -c "python manage.py import_concepts_to_source --source <source unique id from mongo db> --token <root_auth_token> --inline-indexing true <json_file_name>"
  • Import mappings:
docker-compose run -d --name ocl_import ocl_api bash -c "python manage.py import_mappings_to_source --source <source unique id from mongo db> --token <root_auth_token> --inline-indexing true <json_file_name>"

Additional notes on performing the imports

  • It is possible to disable validation during import by using the DISABLE_VALIDATION=True flag, e.g.:
docker-compose run -d --name ocl_import_concepts ocl_api bash -c "DISABLE_VALIDATION=True python manage.py import_concepts_to_source --source [source_uuid] --token [user_api_token] [import_filename] && echo y | python manage.py update_index"
  • Follow the import logs: docker logs -f ocl_import
  • To remove an existing ocl_import container (if it already exists): docker rm ocl_import
  • Stop the import: docker stop ocl_import
  • Resume the import (if stopped): docker start ocl_import
  • All organizations and sources, including those referenced in mappings or collections, must already exist in OCL before running this script. Make sure you've already created the organizations and sources beforehand.

Clone this wiki locally