Skip to content

Commit

Permalink
Pull request #9: FHIR-2450 added readme notes
Browse files Browse the repository at this point in the history
Merge in DBGFIHR/public-documentation from feature/FHIR-2450/add-URECA-sample-code-markdown to production

* commit '0b138e57e49234dac4c378160ae432b74e1f2939':
  Remove the stylesheet since it does not render in Bitbucket
  Replace the rest of the checkmarks with bulletted lists
  Try bulleted list with checkmarks FHIR-2540
  followed Eric's review comments.take 2. FHIR-2450
  followed Eric's review comments. FHIR-2450
  FHIR-2450 added readme notes
  • Loading branch information
mingward authored and RadixSeven committed Aug 19, 2024
2 parents a54970c + 0b138e5 commit 05828e0
Show file tree
Hide file tree
Showing 2 changed files with 110 additions and 107 deletions.
193 changes: 88 additions & 105 deletions jupyter/pilot/Notebook01_phs002921_URECA_subject_phenotype.ipynb
Original file line number Diff line number Diff line change
@@ -1,185 +1,168 @@
{
"cells": [
{
"cell_type": "raw",
"id": "1211f6c8-6ee5-4d5c-b926-1f3210c8704a",
"metadata": {},
"source": [
"# Query pilot server for pheontype data.\n",
"## What FHIR server to use?\n",
"Note this sample code is using a synthetic data server at: https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1\n",
"The real server for URECA study(phs002921) is at: https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot1/x1. You will first need to make Controlled Data Access Request(DAR). https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/GetPdf.cgi?document_name=GeneralAAInstructions.pdf\n",
"## About the authorization token:\n",
"1. If you are using the synthetic server to try out FHIR pilot server, you do not need to have the \"real token\" in the token file. The script below still uses a token file so it works once you have the real token in the file.\n",
"2. If you want to use the real study data which is controlled-access, you need DAR approval. After your DAR is approved, go to https://www.ncbi.nlm.nih.gov/gap/power-user-portal/, login with your eRA account, scroll down and click on the \"Task Specific Token\" button to get the token file. Save the token in a text file. In the example below, it is saved to \"task-specific-token-all.txt\".\n",
"## What does this script do?\n",
"This sample script shows how to get the Study Subject Phenotype data. You can see the content of the Subject Phenotype dataset here: https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/dataset.cgi?study_id=phs002921.v2.p1&pht=12614 including the data dictionary (https://ftp.ncbi.nlm.nih.gov/dbgap/studies/phs002921/phs002921.v2.p1/pheno_variable_summaries/phs002921.v2.pht012614.v1.ICAC_Subject_Phenotypes.data_dict.xml ) \n",
"Note that in dbGaP, the Subject Phenotype dataset usually includes demographic data in addition to phenotypic data.\n",
"## Script summary\n",
"This script first connects to the synthetic data server. Retrieves the patients of URECA study and saves it in a Python List: patient_ids. \n",
"The script then iterates through the \"patient_ids\", to get the data in \"subject phenotype\" file which is stored in FHIR Observation Resource. \n",
"The script saves the phenotype values in patient_observations.csv. \n"
]
},
{
"cell_type": "code",
"id": "d4b50729-08c1-43d2-91aa-edc93194f03a",
"metadata": {
"ExecuteTime": {
"end_time": "2024-08-07T19:32:03.428830Z",
"start_time": "2024-08-07T19:31:34.565851Z"
}
},
"execution_count": null,
"id": "37fe7a43-20ea-406a-b02d-bd0b86aa9be0",
"metadata": {},
"outputs": [],
"source": [
"import os\n",
"import requests\n",
"import csv\n",
"from datetime import datetime\n",
"from time import sleep\n",
"from fhir_fetcher import fetch_all_data # Ensure this module is available\n",
"\n",
"# and handles paging through all records\n",
"\n",
"from fhir_fetcher import fetch_all_data # Ensure this module is available and handles paging through all records\n",
"\n",
"def fetch_patient_observations(session, fhir_base_url, patient_id):\n",
" qstr = f\"Observation?subject=Patient/{patient_id}\"\n",
" qstr = f'Observation?subject=Patient/{patient_id}'\n",
" # above qstr example:\n",
" # https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/Observation?subject=Patient/4317770\n",
" start_url = f\"{fhir_base_url}/{qstr}\"\n",
" observations = fetch_all_data(\n",
" session, start_url, 0\n",
" ) # Fetch all observations for the patient\n",
" observations = fetch_all_data(session, start_url, 0) # Fetch all observations for the patient\n",
" return observations\n",
"\n",
"\n",
"def extract_observation_data(observations):\n",
" data = {}\n",
" for entry in observations:\n",
" resource = entry.get(\"resource\", {})\n",
" code = resource.get(\"code\", {}).get(\"coding\", [{}])[0]\n",
" attribute_name = code.get(\"display\", \"\")\n",
" value_string = resource.get(\"valueString\", \"\")\n",
" value_quantity = resource.get(\"valueQuantity\", {}).get(\"value\", \"\")\n",
" resource = entry.get('resource', {})\n",
" code = resource.get('code', {}).get('coding', [{}])[0]\n",
" attribute_name = code.get('display', '')\n",
" value_string = resource.get('valueString', '')\n",
" value_quantity = resource.get('valueQuantity', {}).get('value', '')\n",
" if attribute_name:\n",
" value = value_string if value_string else value_quantity\n",
" if value:\n",
" data[attribute_name] = value\n",
" return data\n",
"\n",
"\n",
"def fetch_patient_ids(session, fhir_base_url, study_reference):\n",
" \n",
" \n",
" query_url = f\"{fhir_base_url}/ResearchSubject?study={study_reference}\"\n",
" print(query_url)\n",
" research_subjects = fetch_all_data(session, query_url, 0, \"n\")\n",
" patient_ids = [\n",
" entry[\"resource\"][\"individual\"][\"reference\"].split(\"/\")[-1]\n",
" for entry in research_subjects\n",
" ]\n",
" # query_url example: https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/ResearchSubject?study=phs002921\n",
" print ( query_url)\n",
" research_subjects = fetch_all_data(session, query_url, 0, 'n')\n",
" patient_ids = [entry['resource']['individual']['reference'].split('/')[-1] for entry in research_subjects]\n",
" return patient_ids\n",
"\n",
"\n",
"def main():\n",
"\n",
" # \n",
" # If I had the patient_observations.csv open, then I am running this program, it will give an error when trying to write to it.\n",
" # So check if the file is writable in the begining to avoid getting the error at the end of program after waiting for the program to finish:\n",
" #\n",
" output_file = 'patient_observations.csv'\n",
"\n",
" # Check if the file can be opened for writing\n",
" try:\n",
" with open(output_file, 'w', newline='') as csvfile:\n",
" pass # File opened successfully, nothing to write yet\n",
" except PermissionError:\n",
" print(f\"Permission denied: Cannot open {output_file} for writing.\")\n",
" # Handle the error (e.g., exit the program or ask the user to close the file)\n",
" exit(1)\n",
"\n",
"# Rest of your program logic goes here\n",
"\n",
" \n",
" starttime = datetime.now()\n",
" starttimeStr = starttime.strftime(\"%Y-%m-%d %H:%M:%S\")\n",
" starttimeStr = starttime.strftime('%Y-%m-%d %H:%M:%S')\n",
" print(\"====== start time:\", starttimeStr)\n",
"\n",
" ###############################################################################################\n",
" # get the token from https://www.ncbi.nlm.nih.gov/gap/power-user-portal/.\n",
" # get the token from https://www.ncbi.nlm.nih.gov/gap/power-user-portal/. \n",
" # Scroll down and click on the \"Task Specific Token\" button to get the light-weight version of the dbGaP RAS Passport.\n",
" # Save the file into a text. In my example, it is saved to task-specific-token_all.txt.\n",
" # Save the file into a text. In my example, it is saved to .task-specific-token_all.txt. \n",
" ###############################################################################################\n",
" TST_PATH = \"~/dev/fhir/task-specific-token-all.txt\"\n",
" fhir_base_url = \"https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot1/x1\"\n",
"\n",
" with open(os.path.expanduser(TST_PATH), \"r\") as f:\n",
" TST_PATH = '~/dev/fhir/task-specific-token-all.txt' \n",
" fhir_base_url = \"https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1\"\n",
" \n",
" with open(os.path.expanduser(TST_PATH), 'r') as f: \n",
" tst_token = f.read().strip()\n",
"\n",
" \n",
" session = requests.Session()\n",
" session.headers.update(\n",
" {\n",
" \"Accept\": \"application/fhir+json\",\n",
" \"Authorization\": f\"Bearer {tst_token}\",\n",
" \"Content-Type\": \"application/x-www-form-urlencoded\",\n",
" }\n",
" )\n",
" session.headers.update({\n",
" 'Accept': 'application/fhir+json',\n",
" 'Authorization': f'Bearer {tst_token}',\n",
" 'Content-Type': 'application/x-www-form-urlencoded',\n",
" })\n",
"\n",
" # study_reference = \"phs002921.v2.p1.c1\"\n",
" study_reference = \"phs002921\"\n",
"\n",
" ################################################################################################\n",
" # https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/ResearchSubject?study=phs002921\n",
" # https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/Observation?subject=Patient/4317770\n",
" # ###############################################################################################\n",
"\n",
" \n",
" patient_ids = fetch_patient_ids(session, fhir_base_url, study_reference)\n",
" print(f\"Total patients fetched: {len(patient_ids)}\")\n",
"\n",
" data = []\n",
" columns = set()\n",
" patients_with_observations = 0\n",
" for patient_id in patient_ids[:2]:\n",
" observations = fetch_patient_observations(\n",
" session, fhir_base_url, patient_id\n",
" )\n",
" for patient_id in patient_ids:\n",
" observations = fetch_patient_observations(session, fhir_base_url, patient_id)\n",
" observation_data = extract_observation_data(observations)\n",
" if observation_data:\n",
" observation_data[\"Patient\"] = patient_id\n",
" observation_data['Patient'] = patient_id\n",
" columns.update(observation_data.keys())\n",
" data.append(observation_data)\n",
" patients_with_observations += 1\n",
" # print(f\"Observations obtained for patient: {patient_id}\")\n",
" print(\n",
" f\"Accumulative patients with observations: {patients_with_observations}\"\n",
" )\n",
" print(f\"Accumulative patients with observations: {patients_with_observations}\")\n",
"\n",
" sleep(\n",
" 1\n",
" ) # Add a delay of 1 second between each patient API request to avoid rate limits\n",
" sleep(0.3) # Add a delay of 1 second between each patient API request to avoid rate limits\n",
"\n",
" columns = [\"Patient\"] + sorted(\n",
" columns\n",
" ) # Ensure 'Patient' is the first column\n",
" columns = ['Patient'] + sorted(columns) # Ensure 'Patient' is the first column\n",
"\n",
" output_file = \"patient_observations.csv\"\n",
" with open(output_file, \"w\", newline=\"\") as csvfile:\n",
" output_file = 'patient_observations.csv'\n",
" with open(output_file, 'w', newline='') as csvfile:\n",
" csvwriter = csv.DictWriter(csvfile, fieldnames=columns)\n",
" csvwriter.writeheader()\n",
" csvwriter.writerows(data)\n",
"\n",
" print(f\"Data written to {output_file}\")\n",
"\n",
" endtime = datetime.now()\n",
" endtimeStr = endtime.strftime(\"%Y-%m-%d %H:%M:%S\")\n",
" endtimeStr = endtime.strftime('%Y-%m-%d %H:%M:%S')\n",
" print(\"====== end time:\", endtimeStr)\n",
"\n",
" elapsed_time = endtime - starttime\n",
" elapsed_seconds = elapsed_time.total_seconds()\n",
" eminutes = elapsed_seconds // 60\n",
" eseconds = elapsed_seconds % 60\n",
"\n",
" print(\n",
" f\"===========Elapsed time: {int(eminutes)} minutes and {int(eseconds)} seconds.\"\n",
" )\n",
"\n",
" print(f\"===========Elapsed time: {int(eminutes)} minutes and {int(eseconds)} seconds.\")\n",
"\n",
"if __name__ == \"__main__\":\n",
" main()"
],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"====== start time: 2024-08-07 15:31:34\n",
"https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot1/x1/ResearchSubject?study=phs002921\n",
"Total patients fetched: 1035\n",
"Accumulative patients with observations: 1\n",
"Accumulative patients with observations: 2\n",
"Data written to patient_observations.csv\n",
"====== end time: 2024-08-07 15:32:03\n",
"===========Elapsed time: 0 minutes and 28 seconds.\n"
]
}
],
"execution_count": 3
" main()\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "73d4fd89-d458-4f4d-8e0a-e8bb23027492",
"metadata": {
"ExecuteTime": {
"end_time": "2024-08-07T19:31:25.033138Z",
"start_time": "2024-08-07T19:31:25.029617Z"
}
},
"source": [],
"outputs": [],
"execution_count": 2
},
{
"metadata": {},
"cell_type": "code",
"outputs": [],
"execution_count": null,
"source": "",
"id": "e688d9da99c28d00"
"source": []
}
],
"metadata": {
Expand Down
24 changes: 22 additions & 2 deletions jupyter/pilot/README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,25 @@
# OVERVIEW

## dir content:
This pilot directory will have the sample code to access the dbGaP pilot FHIR servers. There are 3 pilot FHIR servers.

pilot subdir will contain sample jupyter scripts to access the dbGaP pilot FHIR server.
1. **FHIR API service for ICAC/URECA** at [https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot1/x1/metadata](https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot1/x1/metadata) for dbGaP datasets "Whole Genome Sequencing in the Inner City Asthma Consortium (ICAC) Cohorts".


- Please see [phs002921](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002921.v2.p1) for study details.
- You need NIH Data Access approval to access this study. Please follow [instructions](https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login) to request access.
- Once your Data Access Request (DAR) is approved, you can access this study's data both from the dbGaP website and from this pilot FHIR API server.
- With you DAR approval, you can get the FHIR API authorization token at [dbGaP power user portal](https://www.ncbi.nlm.nih.gov/gap/power-user-portal/) and scroll-down to click on "Task specific token".

2. **FHIR API service for UDN** at [https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot2/x1/metadata](https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot2/x1/metadata) for dbGaP datasets "Clinical and Genetic Evaluation of Individuals with Undiagnosed Disorders through the Undiagnosed Diseases Network (UDN)".

- Please see [phs001232](https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs001232.v5.p2) for study details.
- You need NIH Data Access approval to access this study. Please follow [instructions](https://dbgap.ncbi.nlm.nih.gov/aa/wga.cgi?page=login) to request access.
- Once your Data Access Request (DAR) is approved, you can access this study's data both from the dbGaP website and from this pilot FHIR API server.
- With you DAR approval, you can get the FHIR API authorization token at [dbGaP power user portal](https://www.ncbi.nlm.nih.gov/gap/power-user-portal/) and scroll-down to click on "Task specific token".

3. 📌 **Important note on data security** If you have approved access and are using URECA data at fhir-jpa-pilot1 or UDN data at fhir-jpa-pilot2, note that the sample code output csv files contain "controlled-access" data. Please *only* save the files in secure computing environment which is allowed to have controlled data.

4. **Open-access test FHIR server** for those without Data Access approval but who would like to explore how to programmatically access dbGaP data with the FHIR API. This test server with synthetic data is at [https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/metadata](https://dbgap-api.ncbi.nlm.nih.gov/fhir-jpa-pilot/x1/metadata).


> 📌 **Note:** Please note that if you are using CAVATICA to access, make sure to enable "Allow Network Access" in the Project setting.

0 comments on commit 05828e0

Please sign in to comment.