Visit Diagnoses Browser

The Visit Diagnoses Browser presents a simulated Electronic Health Record (EHR) dataset generated by Synthea, representing the medical records of approximately 11,000 patients from Massachusetts.

The team at Databricks used PySpark, a framework for Apache Spark in Python, to ingest data from CSV files, clean patient Personally Identifiable Information (PII), and store the de-identified data in Delta Lake, a robust storage layer. With Delta tables, they created a database storing the patient records for subsequent data analysis.

More details regarding their methodology can be found in the blog post Building a Modern Clinical Health Data Lake with Delta Lake by Frank Austin Nothaft, Michael Ortega, and Amir Kermany.

We pulled the data into RStudio using the odbc package. More on working with databases on RStudio is available on the Posit Solutions site. We used the data to create a Shiny app with bslib.The data is represented in visualizations made with ggplot2 and Plotly and tables made with gt.

We tap into the power of Databricks by fetching the EHR data contained in the Delta tables through ODBC. Creating the browser in Shiny adds a layer of versatility and interactivity to the user experience.

Thanks to the power of bslib, we can customize our Shiny app with layouts, tooltips, themes, and more.

Installation/Setup

To rebuild this app, you must have:

A Databricks SQL warehouse, where the EHR data is stored
The Databricks ODBC driver installed on your laptop, or access to Posit Professional Drivers
An R environment with associated packages

Accessing the EHR data in Databricks

In this example, the EHR data is stored under demos > ehr > encounters. Create this table in Databricks by running the script below:

CREATE TABLE encounters USING com.databricks.spark.csv OPTIONS(path 'dbfs:/databricks-datasets/rwe/ehr/csv/encounters.csv', header "true");

Accessing the SQL warehouse HTTP path

Find the HTTP Path needed for the SQL warehouse by:

Navigating to your Databricks web UI
Selecting SQL Warehouses from the left menu
In the interface to the right, selecting Connection details
Copying the value under HTTP path

Download the ODBC driver

Download the ODBC driver directly from Databricks. For Posit Workbench, RStudio Desktop Pro, Posit Connect, or Shiny Server Pro users, you can download and use Posit Professional Drivers at no additional charge.

Connecting to the Databricks SQL warehouse

In RStudio (or your IDE of choice), install and load the DBI and odbc package.

install.packages("DBI")
install.package("odbc")

If you are using odbc v1.4.0 or above, the odbc::databricks() function makes it easier to connect to Databricks. Run the code below to make the connection, inputting your HTTP Path obtained in the step above:

con <- DBI::dbConnect(
  odbc::databricks(),
  httpPath = "[SQL Warehouse service path]"
  )

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Visit Diagnoses Browser

Installation/Setup

Accessing the EHR data in Databricks

Accessing the SQL warehouse HTTP path

Download the ODBC driver

Connecting to the Databricks SQL warehouse

Files

README.md

Latest commit

History

README.md

File metadata and controls

Visit Diagnoses Browser

Installation/Setup

Accessing the EHR data in Databricks

Accessing the SQL warehouse HTTP path

Download the ODBC driver

Connecting to the Databricks SQL warehouse