Pull request #9: FHIR-2450 added readme notes

Merge in DBGFIHR/public-documentation from feature/FHIR-2450/add-URECA-sample-code-markdown to production * commit '0b138e57e49234dac4c378160ae432b74e1f2939': Remove the stylesheet since it does not render in Bitbucket Replace the rest of the checkmarks with bulletted lists Try bulleted list with checkmarks FHIR-2540 followed Eric's review comments.take 2. FHIR-2450 followed Eric's review comments. FHIR-2450 FHIR-2450 added readme notes
ncbi · Aug 19, 2024 · 05828e0 · 05828e0
2 parents a54970c + 0b138e5
commit 05828e0
Show file tree

Hide file tree

Showing 2 changed files with 110 additions and 107 deletions.
diff --git a/jupyter/pilot/Notebook01_phs002921_URECA_subject_phenotype.ipynb b/jupyter/pilot/Notebook01_phs002921_URECA_subject_phenotype.ipynb
@@ -1,185 +1,168 @@
 {
  "cells": [
+  {
+   "cell_type": "raw",
+   "id": "1211f6c8-6ee5-4d5c-b926-1f3210c8704a",
+   "metadata": {},
+   "source": [
+    "# Query pilot server for pheontype data.\n",
+    "## What FHIR server to use?\n",
+    "Note this sample code is using a synthetic data server at: https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1\n",
+    "The real server for URECA study(phs002921) is at: https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot1/x1. You will first need to make Controlled Data Access Request(DAR). https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=GeneralAAInstructions.pdf\n",
+    "## About the authorization token:\n",
+    "1. If you are using the synthetic server to try out FHIR pilot server, you do not need to have the \"real token\" in the token file. The script below still uses a token file so it works once you have the real token in the file.\n",
+    "2. If you want to use the real study data which is controlled-access, you need DAR approval. After your DAR is approved, go to https://www.ncbi.nlm.nih.gov/gap/power-user-portal/, login with your eRA account, scroll down and click on the \"Task Specific Token\" button to get the token file. Save the token in a text file. In the example below, it is saved to \"task-specific-token-all.txt\".\n",
+    "## What does this script do?\n",
+    "This sample script shows how to get the Study Subject Phenotype data.  You can see the content of the Subject Phenotype dataset here: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/dataset.cgi?study_id=phs002921.v2.p1&pht=12614 including the data dictionary (https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs002921/phs002921.v2.p1/pheno_variable_summaries/phs002921.v2.pht012614.v1.ICAC_Subject_Phenotypes.data_dict.xml ) \n",
+    "Note that in dbGaP, the Subject Phenotype dataset usually includes demographic data in addition to phenotypic data.\n",
+    "## Script summary\n",
+    "This script first connects to the synthetic data server. Retrieves the patients of URECA study and saves it in a Python List: patient_ids. \n",
+    "The script then iterates through the \"patient_ids\", to get the data in \"subject phenotype\" file which is stored in FHIR Observation Resource. \n",
+    "The script saves the phenotype values in patient_observations.csv.  \n"
+   ]
+  },
   {
    "cell_type": "code",
-   "id": "d4b50729-08c1-43d2-91aa-edc93194f03a",
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-08-07T19:32:03.428830Z",
-     "start_time": "2024-08-07T19:31:34.565851Z"
-    }
-   },
+   "execution_count": null,
+   "id": "37fe7a43-20ea-406a-b02d-bd0b86aa9be0",
+   "metadata": {},
+   "outputs": [],
    "source": [
     "import os\n",
     "import requests\n",
     "import csv\n",
     "from datetime import datetime\n",
     "from time import sleep\n",
-    "from fhir_fetcher import fetch_all_data  # Ensure this module is available\n",
-    "\n",
-    "# and handles paging through all records\n",
-    "\n",
+    "from fhir_fetcher import fetch_all_data  # Ensure this module is available and handles paging through all records\n",
     "\n",
     "def fetch_patient_observations(session, fhir_base_url, patient_id):\n",
-    "    qstr = f\"Observation?subject=Patient/{patient_id}\"\n",
+    "    qstr = f'Observation?subject=Patient/{patient_id}'\n",
+    "    # above qstr example:\n",
+    "    # https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/Observation?subject=Patient/4317770\n",
     "    start_url = f\"{fhir_base_url}/{qstr}\"\n",
-    "    observations = fetch_all_data(\n",
-    "        session, start_url, 0\n",
-    "    )  # Fetch all observations for the patient\n",
+    "    observations = fetch_all_data(session, start_url, 0)  # Fetch all observations for the patient\n",
     "    return observations\n",
     "\n",
-    "\n",
     "def extract_observation_data(observations):\n",
     "    data = {}\n",
     "    for entry in observations:\n",
-    "        resource = entry.get(\"resource\", {})\n",
-    "        code = resource.get(\"code\", {}).get(\"coding\", [{}])[0]\n",
-    "        attribute_name = code.get(\"display\", \"\")\n",
-    "        value_string = resource.get(\"valueString\", \"\")\n",
-    "        value_quantity = resource.get(\"valueQuantity\", {}).get(\"value\", \"\")\n",
+    "        resource = entry.get('resource', {})\n",
+    "        code = resource.get('code', {}).get('coding', [{}])[0]\n",
+    "        attribute_name = code.get('display', '')\n",
+    "        value_string = resource.get('valueString', '')\n",
+    "        value_quantity = resource.get('valueQuantity', {}).get('value', '')\n",
     "        if attribute_name:\n",
     "            value = value_string if value_string else value_quantity\n",
     "            if value:\n",
     "                data[attribute_name] = value\n",
     "    return data\n",
     "\n",
-    "\n",
     "def fetch_patient_ids(session, fhir_base_url, study_reference):\n",
+    "    \n",
+    "    \n",
     "    query_url = f\"{fhir_base_url}/ResearchSubject?study={study_reference}\"\n",
-    "    print(query_url)\n",
-    "    research_subjects = fetch_all_data(session, query_url, 0, \"n\")\n",
-    "    patient_ids = [\n",
-    "        entry[\"resource\"][\"individual\"][\"reference\"].split(\"/\")[-1]\n",
-    "        for entry in research_subjects\n",
-    "    ]\n",
+    "    # query_url example: https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/ResearchSubject?study=phs002921\n",
+    "    print ( query_url)\n",
+    "    research_subjects = fetch_all_data(session, query_url, 0, 'n')\n",
+    "    patient_ids = [entry['resource']['individual']['reference'].split('/')[-1] for entry in research_subjects]\n",
     "    return patient_ids\n",
     "\n",
-    "\n",
     "def main():\n",
+    "\n",
+    "    # \n",
+    "    # If I had the patient_observations.csv open, then I am running this program, it will give an error when trying to write to it.\n",
+    "    # So check if the file is writable in the begining to avoid getting the error at the end of program after waiting for the program to finish:\n",
+    "    #\n",
+    "    output_file = 'patient_observations.csv'\n",
+    "\n",
+    "    # Check if the file can be opened for writing\n",
+    "    try:\n",
+    "        with open(output_file, 'w', newline='') as csvfile:\n",
+    "            pass  # File opened successfully, nothing to write yet\n",
+    "    except PermissionError:\n",
+    "        print(f\"Permission denied: Cannot open {output_file} for writing.\")\n",
+    "        # Handle the error (e.g., exit the program or ask the user to close the file)\n",
+    "        exit(1)\n",
+    "\n",
+    "# Rest of your program logic goes here\n",
+    "\n",
+    "    \n",
     "    starttime = datetime.now()\n",
-    "    starttimeStr = starttime.strftime(\"%Y-%m-%d %H:%M:%S\")\n",
+    "    starttimeStr = starttime.strftime('%Y-%m-%d %H:%M:%S')\n",
     "    print(\"====== start time:\", starttimeStr)\n",
     "\n",
     "    ###############################################################################################\n",
-    "    # get the token from https://www.ncbi.nlm.nih.gov/gap/power-user-portal/.\n",
+    "    # get the token from https://www.ncbi.nlm.nih.gov/gap/power-user-portal/.  \n",
     "    #  Scroll down and click on the \"Task Specific Token\" button to get the light-weight version of the dbGaP RAS Passport.\n",
-    "    #  Save the file into a text. In my example, it is saved to task-specific-token_all.txt.\n",
+    "    #  Save the file into a text. In my example, it is saved to .task-specific-token_all.txt. \n",
     "    ###############################################################################################\n",
-    "    TST_PATH = \"~/dev/fhir/task-specific-token-all.txt\"\n",
-    "    fhir_base_url = \"https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot1/x1\"\n",
-    "\n",
-    "    with open(os.path.expanduser(TST_PATH), \"r\") as f:\n",
+    "    TST_PATH = '~/dev/fhir/task-specific-token-all.txt'  \n",
+    "    fhir_base_url = \"https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1\"\n",
+    "                   \n",
+    "    with open(os.path.expanduser(TST_PATH), 'r') as f:  \n",
     "        tst_token = f.read().strip()\n",
-    "\n",
+    "    \n",
     "    session = requests.Session()\n",
-    "    session.headers.update(\n",
-    "        {\n",
-    "            \"Accept\": \"application/fhir+json\",\n",
-    "            \"Authorization\": f\"Bearer {tst_token}\",\n",
-    "            \"Content-Type\": \"application/x-www-form-urlencoded\",\n",
-    "        }\n",
-    "    )\n",
+    "    session.headers.update({\n",
+    "        'Accept': 'application/fhir+json',\n",
+    "        'Authorization': f'Bearer {tst_token}',\n",
+    "        'Content-Type': 'application/x-www-form-urlencoded',\n",
+    "    })\n",
     "\n",
     "    # study_reference = \"phs002921.v2.p1.c1\"\n",
     "    study_reference = \"phs002921\"\n",
-    "\n",
-    "    ################################################################################################\n",
-    "    # https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/ResearchSubject?study=phs002921\n",
-    "    # https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/Observation?subject=Patient/4317770\n",
-    "    # ###############################################################################################\n",
-    "\n",
+    " \n",
     "    patient_ids = fetch_patient_ids(session, fhir_base_url, study_reference)\n",
     "    print(f\"Total patients fetched: {len(patient_ids)}\")\n",
     "\n",
     "    data = []\n",
     "    columns = set()\n",
     "    patients_with_observations = 0\n",
-    "    for patient_id in patient_ids[:2]:\n",
-    "        observations = fetch_patient_observations(\n",
-    "            session, fhir_base_url, patient_id\n",
-    "        )\n",
+    "    for patient_id in patient_ids:\n",
+    "        observations = fetch_patient_observations(session, fhir_base_url, patient_id)\n",
     "        observation_data = extract_observation_data(observations)\n",
     "        if observation_data:\n",
-    "            observation_data[\"Patient\"] = patient_id\n",
+    "            observation_data['Patient'] = patient_id\n",
     "            columns.update(observation_data.keys())\n",
     "            data.append(observation_data)\n",
     "            patients_with_observations += 1\n",
     "            # print(f\"Observations obtained for patient: {patient_id}\")\n",
-    "            print(\n",
-    "                f\"Accumulative patients with observations: {patients_with_observations}\"\n",
-    "            )\n",
+    "            print(f\"Accumulative patients with observations: {patients_with_observations}\")\n",
     "\n",
-    "        sleep(\n",
-    "            1\n",
-    "        )  # Add a delay of 1 second between each patient API request to avoid rate limits\n",
+    "        sleep(0.3)  # Add a delay of 1 second between each patient API request to avoid rate limits\n",
     "\n",
-    "    columns = [\"Patient\"] + sorted(\n",
-    "        columns\n",
-    "    )  # Ensure 'Patient' is the first column\n",
+    "    columns = ['Patient'] + sorted(columns)  # Ensure 'Patient' is the first column\n",
     "\n",
-    "    output_file = \"patient_observations.csv\"\n",
-    "    with open(output_file, \"w\", newline=\"\") as csvfile:\n",
+    "    output_file = 'patient_observations.csv'\n",
+    "    with open(output_file, 'w', newline='') as csvfile:\n",
     "        csvwriter = csv.DictWriter(csvfile, fieldnames=columns)\n",
     "        csvwriter.writeheader()\n",
     "        csvwriter.writerows(data)\n",
     "\n",
     "    print(f\"Data written to {output_file}\")\n",
     "\n",
     "    endtime = datetime.now()\n",
-    "    endtimeStr = endtime.strftime(\"%Y-%m-%d %H:%M:%S\")\n",
+    "    endtimeStr = endtime.strftime('%Y-%m-%d %H:%M:%S')\n",
     "    print(\"====== end time:\", endtimeStr)\n",
     "\n",
     "    elapsed_time = endtime - starttime\n",
     "    elapsed_seconds = elapsed_time.total_seconds()\n",
     "    eminutes = elapsed_seconds // 60\n",
     "    eseconds = elapsed_seconds % 60\n",
     "\n",
-    "    print(\n",
-    "        f\"===========Elapsed time: {int(eminutes)} minutes and {int(eseconds)} seconds.\"\n",
-    "    )\n",
-    "\n",
+    "    print(f\"===========Elapsed time: {int(eminutes)} minutes and {int(eseconds)} seconds.\")\n",
     "\n",
     "if __name__ == \"__main__\":\n",
-    "    main()"
-   ],
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "====== start time: 2024-08-07 15:31:34\n",
-      "https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot1/x1/ResearchSubject?study=phs002921\n",
-      "Total patients fetched: 1035\n",
-      "Accumulative patients with observations: 1\n",
-      "Accumulative patients with observations: 2\n",
-      "Data written to patient_observations.csv\n",
-      "====== end time: 2024-08-07 15:32:03\n",
-      "===========Elapsed time: 0 minutes and 28 seconds.\n"
-     ]
-    }
-   ],
-   "execution_count": 3
+    "    main()\n"
+   ]
   },
   {
    "cell_type": "code",
+   "execution_count": null,
    "id": "73d4fd89-d458-4f4d-8e0a-e8bb23027492",
-   "metadata": {
-    "ExecuteTime": {
-     "end_time": "2024-08-07T19:31:25.033138Z",
-     "start_time": "2024-08-07T19:31:25.029617Z"
-    }
-   },
-   "source": [],
-   "outputs": [],
-   "execution_count": 2
-  },
-  {
    "metadata": {},
-   "cell_type": "code",
    "outputs": [],
-   "execution_count": null,
-   "source": "",
-   "id": "e688d9da99c28d00"
+   "source": []
   }
  ],
  "metadata": {

diff --git a/jupyter/pilot/README.md b/jupyter/pilot/README.md
@@ -1,5 +1,25 @@
 # OVERVIEW
 
-## dir content:
+This pilot directory will have the sample code to access the dbGaP pilot FHIR servers. There are 3 pilot FHIR servers.
 
-pilot subdir will contain sample jupyter scripts to access the dbGaP pilot FHIR server.
+1. **FHIR API service for ICAC/URECA** at [https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot1/x1/metadata](https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot1/x1/metadata) for dbGaP datasets "Whole Genome Sequencing in the Inner City Asthma Consortium (ICAC) Cohorts".
+
+
+   - Please see [phs002921](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002921.v2.p1) for study details.
+   - You need NIH Data Access approval to access this study. Please follow [instructions](https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login) to request access.
+   - Once your Data Access Request (DAR) is approved, you can access this study's data both from the dbGaP website and from this pilot FHIR API server.
+   - With you DAR approval, you can get the FHIR API authorization token at [dbGaP power user portal](https://www.ncbi.nlm.nih.gov/gap/power-user-portal/) and scroll-down to click on "Task specific token".
+
+2. **FHIR API service for UDN** at [https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot2/x1/metadata](https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot2/x1/metadata) for dbGaP datasets "Clinical and Genetic Evaluation of Individuals with Undiagnosed Disorders through the Undiagnosed Diseases Network (UDN)".
+
+   - Please see [phs001232](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001232.v5.p2) for study details.
+   - You need NIH Data Access approval to access this study. Please follow [instructions](https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login) to request access.
+   - Once your Data Access Request (DAR) is approved, you can access this study's data both from the dbGaP website and from this pilot FHIR API server.
+   - With you DAR approval, you can get the FHIR API authorization token at [dbGaP power user portal](https://www.ncbi.nlm.nih.gov/gap/power-user-portal/) and scroll-down to click on "Task specific token".
+
+3. 📌 **Important note on data security** If you have approved access and are using URECA data at fhir-jpa-pilot1 or UDN data at fhir-jpa-pilot2, note that the sample code output csv files contain "controlled-access" data. Please *only* save the files in secure computing environment which is allowed to have controlled data.
+
+4. **Open-access test FHIR server** for those without Data Access approval but who would like to explore how to programmatically access dbGaP data with the FHIR API. This test server with synthetic data is at [https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/metadata](https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/metadata).
+
+
+> 📌 **Note:** Please note that if you are using CAVATICA to access, make sure to enable "Allow Network Access" in the Project setting.