Skip to content

New definitions

Martin Chapman edited this page Sep 28, 2022 · 4 revisions

New definitions are authored by uploading codelists, keyword lists (automated) or specifying the logic of a definition using a simple syntax contained within a CSV file (semi-automated).

Authoring new definitions requires credentials for the main Phenoflow library, or a running local copy of the library with a pre-created user.

1. Codelists

If the primary logic of a definition is to identify one or more codes from one or more codelists, these lists can be submitted directly to the Phenoflow library, which will further segment the codes in each list into logical groups (each of which is represented by an individual step) to increase intelligibility. If any steps identify a patient as being a case (i.e. the patient has one of the codes it stores), then we consider the patient to have the condition, i.e. there is a disjunction relationship between steps. The library expects the codes in each list to all belong to the same coding system and expects this system to be indicated in the filename using the following naming convention: phenotype-name_system (e.g. abdominal-aortic-aneurysm_icd).

The content of the codelists themselves can vary; Phenoflow supports a variety of common ways to structure codelists, broadly with a column containing the codes themselves, and a description column providing more detail on each code.

Once in the correct format, codelists should be compressed and submitted as a zipped folder to the importCodelists endpoint, along with other information about the definition.

2. Keywords

Keywords are used as the basis for creating (NLP-based) definitions in much the same way as codelists. A single CSV file (with a column named keywords) containing the keywords relating to a given condition is submitted to the importKeywordList endpoint.

3. Defined steps

If there are specific steps to a phenotype definition, these are expressed in what we refer to as a steplist. Steplists adopt a simple syntax that allows for the logic of a phenotype definition to be expressed, or existing definitions (as, say, expressed in a flowchart) to be re-expressed, in a machine-readable form.

Structure

Steplists are CSV files with two columns: logicType indicating to Phenoflow which type of logic is being used within this step; and params which describes the chosen logic, where each individual parameter is separated by a colon. The different types of logic, and required parameters, currently supported by Phenoflow are listed in the following table:

Logic Type Params Description Example
codelist [codelistA.csv]:[N - number of code ocurrences required] A step that identifies a case if the patient has N codes from the referenced codelist] codelist,abdominal-aortic-aneurysm-2_cpt.csv:1
codelistExclude [codelistA.csv] A step that excludes a patient if they have 1 or more codes from the referenced codelist codelistExclude,abdominal-aortic-aneurysm-2_cpt.csv
age [Min age (incl)]:[Max age (excl)] A step that excludes a patient if they do not fall within the stated age range age,40:90
lastEncounter [Max years] A step that excludes patients if their last encounter with a HCP was greater than the stated number of years. lastEncounter,5
codelistsTemporal [codelistA.csv]:[codelistB.csv]:[Min days (excl)]:[Max days (excl)] A step that identifies a case if a patient has a diagnosis from codelistA and a diagnosis from codelistB, and the diagnosis from codelistB was made within the stated range. codelistA.csv:codelistB.csv:10:20

Example

The simplest steplist is one that is simply a codelist, much like a direct codelist import. Such a steplist would look like the following:

logicType,param
codelist,codelistA.csv:1

Order

With the presence of exclusion steps, the order in which logic is expressed within a Phenoflow definition is important. Once a patient is excluded, all their subsequent case steps will evaluate to unclassified by default. Consider the following example (which also serves as a general example of a more complex steplist) for AAA:

logicType,param
age,40:90
codelist,abdominal-aortic-aneurysm_cpt.csv:1
codelist,abdominal-aortic-aneurysm_icdA.csv:1
lastEncounter,5
codelist,abdominal-aortic-aneurysm_icdB.csv:2

Here, any patients that fall outside the age range will be instantly excluded. However, patients with codes from abdominal-aortic-aneurysm_cpt and abdominal-aortic-aneurysm_icdA will still be flagged as cases, even if later excluded due to the last encounter logic. In this way, we are able to represent different case types, with certain exclusions only applying to certain case types.

Importing a steplist

A steplist CSV should be submitted along with a zipped folder containing all referenced codelists to the importSteplist endpoint, along with relevant details of the definition.

4. Defined steps with branches

The phenotypes represented using the Phenoflow model typically consist of a single branch, where the disjunction relationship between steps means that if any of the steps indicate that a patient is a case (subject to no prior exclusions) they are considered to have the condition being modelled. However, more complex definitions may contain multiple branches.

As discussed in the model overview, branches are flattened into individual steps within the Phenoflow model. To demonstrate this visually, we can consider the following abstract definition:

branch

Here, a patient with codes from lists A, B and D is a case, and a patient without any codes from list A and with codes from list C is also a case. Within the Phenoflow model, this logic is represented as follows:

flattened

Here, two steps summarise each branch (joined by a traditional disjunction relationship, which is also true of the relationship between branches), and each condition within the branch must be true in order for that step to resolve, overall, to a 'CASE' value. This differs from the traditional disjunction relationship between steps.

Steplists and branches

To import a definition with branches into Phenoflow, each branch is represented as a steplist, with these lists referenced in a parent steplist.

For example, to represent the left-most branch of the definition above, a file called branchA.csv might look like the following:

logicType,param
codelist,codelistA.csv:1
codelist,codelistB.csv:1
codelist,codelistD.csv:1

And, to represent the right-most branch, a file called branchB.csv might look like the following:

logicType,param
codelistExclude,codelistA.csv
codelist,codelistC.csv

Finally, to connect these branches, our parent steplist would be structured as follows. Note the new identifier branch:

logicType,param
branch,branchA.csv
branch,branchB.csv

Importing a steplist with branches

To upload a definition to Phenoflow that contains branches, the parent steplist should be collected along with a zipped folder containing any codelists referenced from the parent steplist as well as any branch CSVs, and the codelists they reference. These should all be submitted to the importSteplist endpoint, along with relevant details of the definition.