The Visit Diagnoses Browser presents a simulated Electronic Health Record (EHR) dataset generated by Synthea, representing the medical records of approximately 11,000 patients from Massachusetts.
The team at Databricks used PySpark, a framework for Apache Spark in Python, to ingest data from CSV files, clean patient Personally Identifiable Information (PII), and store the de-identified data in Delta Lake, a robust storage layer. With Delta tables, they created a database storing the patient records for subsequent data analysis.
More details regarding their methodology can be found in the blog post Building a Modern Clinical Health Data Lake with Delta Lake by Frank Austin Nothaft, Michael Ortega, and Amir Kermany.
We pulled the data into RStudio using the odbc package. More on working with databases on RStudio is available on the Posit Solutions site. We used the data to create a Shiny app with bslib.The data is represented in visualizations made with ggplot2 and Plotly and tables made with gt.
We tap into the power of Databricks by fetching the EHR data contained in the Delta tables through ODBC. Creating the browser in Shiny adds a layer of versatility and interactivity to the user experience.
Thanks to the power of bslib, we can customize our Shiny app with layouts, tooltips, themes, and more.
To rebuild this app, you must have:
- A Databricks SQL warehouse, where the EHR data is stored
- The Databricks ODBC driver installed on your laptop, or access to Posit Professional Drivers
- An R environment with associated packages
In this example, the EHR data is stored under demos
> ehr
> encounters
. Create this table in Databricks by running the script below:
CREATE TABLE encounters USING com.databricks.spark.csv OPTIONS(path 'dbfs:/databricks-datasets/rwe/ehr/csv/encounters.csv', header "true");
Find the HTTP Path needed for the SQL warehouse by:
- Navigating to your Databricks web UI
- Selecting SQL Warehouses from the left menu
- In the interface to the right, selecting Connection details
- Copying the value under HTTP path
Download the ODBC driver directly from Databricks. For Posit Workbench, RStudio Desktop Pro, Posit Connect, or Shiny Server Pro users, you can download and use Posit Professional Drivers at no additional charge.
In RStudio (or your IDE of choice), install and load the DBI and odbc package.
install.packages("DBI")
install.package("odbc")
If you are using odbc v1.4.0 or above, the odbc::databricks()
function makes it easier to connect to Databricks. Run the code below to make the connection, inputting your HTTP Path obtained in the step above:
con <- DBI::dbConnect(
odbc::databricks(),
httpPath = "[SQL Warehouse service path]"
)