tutorials

sprivite · sprivite · commit 0a348123ae42 · 2024-11-11T16:22:35.000+01:00
diff --git a/docs/tutorials/phenotypes/ArithmeticPhenotype_Tutorial.ipynb b/docs/tutorials/phenotypes/ArithmeticPhenotype_Tutorial.ipynb
@@ -0,0 +1,215 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "c4fa3cdf-d540-4add-8a31-a1fd6b126ba3",
+   "metadata": {},
+   "source": [
+    "# ArithmeticPhenotype Tutorial\n",
+    "\n",
+    "The ArithmeticPhenotype allows us to perform simple mathematical operations such as addition, subtraction, division and multiplication with the output of other phenotypes.\n",
+    "\n",
+    "There are two obvious use cases for this in RWD :\n",
+    "1. calculating medical scores, such as the CHADSVASC score or the Charlson Comorbidity Index\n",
+    "2. calculating a derived measurement value, such as Body Mass Index, which is calculated using height and weight.\n",
+    "\n",
+    "\n",
+    "In this tutorial, we will see how to calculate CHASVASC and how to calculate BMI.\n",
+    "\n",
+    "\n",
+    "## Calculating scores\n",
+    "Like the LogicPhenotype, the Arithmetic phenotype operates on other phenotypes; we refer to the phenotypes that an ArithmeticPhenotype operates on the 'component phenotypes'. \n",
+    "\n",
+    "In order to perform arithmetic, we need to associate a value to patients fulfilling criteria of a component phenotype. By default, this is done by associating the value of '1' with all patients that fulfill the criteria of a component phenotype. Patients that do not fulfill the component phenotype criteria are associated with a value of '0'. \n",
+    "\n",
+    "Let's see how this works on a simple example of the CHADSVASC score. We will assume that Codelists already exist for each component phenotype.\n",
+    "### Step 1 : Create all component phenotypes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "16f7c761-32d8-4a8f-a844-2022b302ddeb",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# Step 1 : First create all component phenotypes\n",
+    "c = CodelistPhenotype(\n",
+    "    codelist=Codelist(\"heart_failure\"), \n",
+    "    domain=\"condition_occurrence\",\n",
+    "    relative_time_range = ONEYEAR_PREINDEX\n",
+    ")\n",
+    "\n",
+    "h = CodelistPhenotype(\n",
+    "    codelist=Codelist(\"hypertension\"), \n",
+    "    domain=\"condition_occurrence\",\n",
+    "    relative_time_range = ONEYEAR_PREINDEX\n",
+    ")\n",
+    "\n",
+    "a75 = AgePhenotype(\n",
+    "    min_age=GreaterThanOrEqualTo(75),\n",
+    "    relative_time_range = ONEYEAR_PREINDEX\n",
+    ")\n",
+    "\n",
+    "d = CodelistPhenotype(\n",
+    "    codelist=Codelist(\"diabetes_and_impaired_glucose_tolerance\"),\n",
+    "    domain=\"condition_occurrence\",\n",
+    "    relative_time_range = ONEYEAR_PREINDEX\n",
+    ")\n",
+    "\n",
+    "s = CodelistPhenotype(\n",
+    "    codelist=Codelist(\"stroke\"), \n",
+    "    domain=\"condition_occurrence\",\n",
+    "    relative_time_range = ONEYEAR_PREINDEX\n",
+    ")\n",
+    "\n",
+    "v = CodelistPhenotype(\n",
+    "    codelist=Codelist(\"peripheral_artery_disease\"), \n",
+    "    domain=\"condition_occurrence\",\n",
+    "    relative_time_range = ONEYEAR_PREINDEX\n",
+    ")\n",
+    "\n",
+    "a65to74 = AgePhenotype(\n",
+    "    min_age=GreaterThanOrEqualTo(65),\n",
+    "    max_age=LessThanOrEqualTo(74),\n",
+    "    relative_time_range = ONEYEAR_PREINDEX\n",
+    ")\n",
+    "\n",
+    "sc = SexPhenotype(allowed_values=[2]) # female is defined as a value of 2 in our optum data base"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "8aecd389-46d7-4a77-88d1-1ede2a881daa",
+   "metadata": {},
+   "source": [
+    "### Step 2 : Create ArithmeticPhenotype\n",
+    "We can then create our arithmetic phenotype by combining our phenotypes with mathematical operations. We do this for chadsvasc by adding up all the component phenotype values. Recall that the default value for a component phenotype is 1; if we want another value associated with the component phenotype, we perform multiplication with that value (see that age>75 and sex category a75,s respectively are associated with the value of 2)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "602d4982-ddd1-457c-a9bd-8fcdd504a003",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "chadsvasc = ScorePhenotype(\n",
+    "    name = \"chadsvasc\",\n",
+    "    expression = c + h + a75 * 2 + d + s * 2 + v + a65to74 + sc,\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "2cb1da51-c544-4dfa-83c7-5e189cdfeb38",
+   "metadata": {},
+   "source": [
+    "## Calculating derived measurement values\n",
+    "MeasurementPhenotypes are unique in that, if the return_value keyword argument is set, they are associated with a value. ArithmeticPhenotype will operate on the returned value of MeasurementPhenotypes, allowing us to calculate derived values from measurement values.\n",
+    "\n",
+    "This is useful for the example of body mass index, which is defined as weight divided by height in meters to the power of 2. \n",
+    "\n",
+    "As seen in the above example, the steps are to (1) define our component phenotypes and (2) create the arithmetic phenotype that combines them with our mathematical operations\n",
+    "### Step 1 : Create all component phenotypes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "dcf77a03-b191-4443-a4f8-822829fd1fb0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# define our component phenotypes\n",
+    "h = MeasurementPhenotype(\n",
+    "    name=\"height\",\n",
+    "    codelist=Codelist(\"HEIGHT\"),\n",
+    "    domain=\"measurement\",\n",
+    "    relative_time_range = ONEYEAR_PREINDEX,\n",
+    "    value_aggregation=\"mean\",\n",
+    "    return_value=\"all\"\n",
+    ")\n",
+    "\n",
+    "w = MeasurementPhenotype(\n",
+    "    name=\"weight\",\n",
+    "    codelist=Codelist(\"WEIGHT\"),\n",
+    "    domain=\"measurement\",\n",
+    "    relative_time_range = ONEYEAR_PREINDEX,\n",
+    "    value_aggregation=\"mean\",\n",
+    "    return_value=\"all\"\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "23f9e24f-8519-4fcc-995c-4dec89754210",
+   "metadata": {},
+   "source": [
+    "### Step 2: Create ArithmeticPhenotype"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "cfcc64fc-c6ce-4e4b-bb8e-1ffea5c21ec5",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# calculate the bmi\n",
+    "bmi = ArithmeticPhenotype(\n",
+    "    name=\"bmi\",\n",
+    "    expression = w / (h / 100) ** 2, \n",
+    ")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "786165f4-7d08-4a16-a5f4-1d5c900eecd7",
+   "metadata": {},
+   "source": [
+    "## Setting value filters\n",
+    "With ArithmeticPhenotype, like MeasurementPhenotype, we can define value_filters that allow us to subset patients that fulfill some filtering criteria.\n",
+    "\n",
+    "For example, I may be interested only in patients with a BMI greater or equal to 30. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "f5f6379b-9cfe-425d-adbb-aedee80185e0",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# calculate the bmi\n",
+    "bmi = ArithmeticPhenotype(\n",
+    "    name=\"bmi\",\n",
+    "    logic=w / (h / 100) ** 2, \n",
+    "    value_filter=ValueFilter(\">=\", 30, \"value\"),\n",
+    ")"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.7"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/tutorials/phenotypes/LogicPhenotype_Tutorial.ipynb b/docs/tutorials/phenotypes/LogicPhenotype_Tutorial.ipynb
@@ -0,0 +1,140 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "ce4eeb71-b2ff-4803-b9c1-8e0e35998feb",
+   "metadata": {},
+   "source": [
+    "# LogicPhenotype Tutorial\n",
+    "\n",
+    "The LogicPhenotype allows us to combine phenotypes with logical operations AND, OR and NOT\n",
+    "\n",
+    "There are obvious use cases for this in RWD :\n",
+    "1. We want to combine information from multiple domains, for example \"procedures\" and \"diagnoses\". An example is : Which patients have a diagnosis for heart transplant OR a procedure for heart transplant?\n",
+    "2. We want to calculate complicated logical definitions : It is common to generate algorithms to correctly classify patients has having a condition. This means we want patients to fulfill some lab value criteria, diagnosis criteria and so on. We can create arbitraily complex definitions using Complex Phenotype\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "f24b9966-9f11-4f00-ab68-bcdfae180f98",
+   "metadata": {},
+   "source": [
+    "## Combining information from multiple domains\n",
+    "We often want to ask 'which patients have a diagnosis of condition_x OR a procedure treating condition_x'. Diagnoses and procedures are often found in separate domains, the condition_occurrence and procedure_occurence table\n",
+    "\n",
+    "LogicPhenotype allows us to combine information from multiple domains\n",
+    "\n",
+    "### Step 1 : Create component phenotypes"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "6761f436-d192-43c0-a1d8-3d214524f98a",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "ht_procedure_codes = Codelist(\"heart_transplant\")\n",
+    "ht_diagnosis_codes = Codelist(\"heart_transplant\")\n",
+    "\n",
+    "\n",
+    "ht_procedures = CodelistPhenotype(\n",
+    "    name_phenotype=\"ht_procedures\",\n",
+    "    codelist=ht_procedure_codes,\n",
+    "    domain=\"procedure_occurrence\",\n",
+    "    time_range_filter=ONEYEAR_PREINDEX,\n",
+    ")\n",
+    "\n",
+    "ht_diagnoses = CodelistPhenotype(\n",
+    "    name_phenotype=\"ht_diagnoses\",\n",
+    "    codelist= ht_diagnosis_codes,\n",
+    "    domain=\"condition_occurrence\",\n",
+    "    time_range_filter=ONEYEAR_PREINDEX,\n",
+    ")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1e156f2b-a4ff-486d-b3cb-9fa7a7d1788c",
+   "metadata": {},
+   "source": [
+    "### Step 2 : Create LogicPhenotype\n",
+    "We are now ready to create our LogicPhenotype using our component phenotypes. Here we can use the logical operations and, or (&,|). \n",
+    "\n",
+    "Here we will show two logic phenotypes, one using OR and one using AND.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "1d9df98c-2556-48eb-ae4e-6379815fe4bc",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "\n",
+    "# use logical OR\n",
+    "ht_or = LogicPhenotype(\n",
+    "    name_phenotype=\"ht_procedure_OR_diagnosis\", \n",
+    "    logic = ht_procedures | ht_diagnoses\n",
+    ")\n",
+    "\n",
+    "# use logical AND\n",
+    "ht_and = LogicPhenotype(\n",
+    "    name_phenotype=\"ht_procedure_AND_diagnosis\", \n",
+    "    logic = ht_procedures & ht_diagnoses\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1365771d-637c-4b26-99c5-30bdd51436a6",
+   "metadata": {},
+   "source": [
+    "## Complicated logical phenotypes\n",
+    "We can add arbitraily complex logic to our operations. Lets add two more component phenotypes, death and an 'end of coverage' phenotype. We will use these to create a censoring event phenotype, which is ht"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "94ba769a-2127-45bb-8009-790a35f3a5d4",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# create two new component phenotypes\n",
+    "death = DeathPhenotype()\n",
+    "late_date_active = ContinuousCoveragePhenotype(return_date=\"last\")\n",
+    "\n",
+    "# create a logic phenotype combining all components\n",
+    "censoring_event = LogicPhenotype(\n",
+    "    name_phenotype=\"any_censoring\",\n",
+    "    logic= ht_or | death | late_date_active,\n",
+    "    return_date=\"first\",\n",
+    ")\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3 (ipykernel)",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.9.19"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
diff --git a/docs/tutorials/phenotypes/MeasurementPhenotype_Tutorial.ipynb b/docs/tutorials/phenotypes/MeasurementPhenotype_Tutorial.ipynb