Skip to content

MIDRC/MIDRC_Diversity_Calculator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MIDRC Diversity Calculator

Technologies | Getting Started | Collaborators | References | Contribute

📱 Visit MIDRC Website

ℹ️ Overview

The MIDRC Diversity Calculator is a tool designed to compare the representativeness of biomedical data. By leveraging the Jensen-Shannon distance (JSD) measure, this tool provides insights into the demographic representativeness of datasets within the biomedical field. It also supports monitoring the representativeness of datasets over time by assessing the representativeness of historical data. Developed and utilized by MIDRC, this tool assesses the representativeness of data within the open data commons to the US population. Additionally, it can be generalized by users for other diversity representativeness needs, such as assessing the similarity of demographic distributions across multiple attributes in different biomedical datasets.

🔧 Features

  • Jensen-Shannon Distance (JSD) Calculation: Uses the JSD measure to assess the representativeness of data.
  • Comparative Analysis: Enables comparisons between different datasets to evaluate demographic diversity.
  • Biomedical Focus: Specifically tailored for analyzing biomedical data, ensuring relevance and similarity.
  • Historical Data: Enables the ability to assess data over time for monitoring changes in representativeness.

📔 Background

The methodology behind the Diversity Calculator is based on the 2023 paper by Whitney et al. titled "Longitudinal assessment of demographic representativeness in the Medical Imaging and Data Resource Center open data commons"[1]. This paper provides the theoretical foundation for using JSD in evaluating demographic representativeness.

screenshot

💻 Technologies

Technologies used with this application

  • Python
  • PySide6
  • numpy
  • scipy
  • pandas

There is a requirements.txt file available to install requirements

🚀 Getting started

Configure yaml

First, configure your own jsdconfig.yaml file to select which data to load by default. There is a jsdconfig-example.yaml file provided that may be copied over or used as a template for your own config file.

  • The filename needs to be specified, and a human-readable name should be provided for use in the plots and figures.
  • Please see the Generating custom Excel files section for additional information.
  • On your first run, you may use cp jsdconfig-example.yaml jsdconfig.yaml to load the MIDRC data.

Run application

To start the application, run python main.py

Generating plots and figures

  • Select the files you wish to compare in the drop-down menus that you wish to make comparisons between.
  • A checkbox is provided next to the drop-down menus to select whether additional plots should be shown for each individual file selected.
  • Note: displaying plots for two or more files simultaneously may require a 4k monitor

Generating custom excel files

  • Use the provided MIDRC, CDC, and Census Excel files as an example on how to prepare your custom data.
  • For each date, cumulative sums are expected.
  • Each attribute should have its own sheet which will be automatically parsed by the application.
  • Column names within each sheet are parsed and compared between files
    • Where there is a matching column name within a worksheet of the same name, the JSD will be calculated using those values.
    • A Date column is expected, and it should be sorted. Please see how the census data is loaded using the example config file if your data does not have multiple dates, and you do not have a date column.
  • The list of attributes provided in the GUI should be a list where worksheets with an identical name exist in both files. If it is not, please check your spelling
  • The remove column name text config parameter is due to how the MIDRC data is generated. There is a (CUSUM) suffix that needs to be removed to compare it to CDC and Census data.

GUI Manipulation

The plots and figures should be movable, adjustable, re-sizable, or hidden.

To see the list of available dock widgets, you can right-click on any menu/title bar area, i.e. either the main window menu bar or any title bar in a dock widget. This is useful if you hide one of hte docked widgets and wish to view them again.

Keyboard commands may be used to copy and paste the calculated JSD values (and dates) and pasted in Excel or a notebook as tab-delimited data.

Prerequisites

  • Python 3.9 or highter
  • Git

Cloning

How to clone the project

git clone https://github.com/MIDRC/MIDRC_Diversity_Calculator.git

Installing Requirements

You may install project dependencies using pip.

Using pip:

cd MIDRC_Diversity_Calculator
python -m pip install --upgrade pip
pip install -r requirements.txt

Starting

How to start the project

cd MIDRC_Diversity_Calculator
cp jsdconfig-example.yaml jsdconfig.yaml
python main.py

🤝 Collaborators

Special thank you for all people that contributed for this project

Robert Tomek, Maryellen Giger, Heather Whitney

We'd also like to acknowledge

Natalie Baughan, Kyle Myers, Karen Drukker, Judy Gichoya, Brad Bower, Weijie Chen, Nicholas Gruszauskas, Jayashree Kalpathy-Cramer, Sanmi Koyejo, Rui Sá, Berkman Sahiner, Zi Zhang,

The MIDRC Bias and Diversity Working Group:

  • Co-leads
    • Karen Drukker
    • Judy Wawira Gichoya
  • AAPM
    • Weijie Chen
    • Kyle Myers
    • Heather Whitney
  • ACR
    • Jayashree Kalpathy-Cramer
  • RSNA
    • Zi Jill Zhang
  • NIH
    • Rui Sá
    • Brad Bower
  • MIDRC Central (University of Chicago)
    • Maryellen Giger
    • Nick Gruszaukas,
    • Katie Pizer
    • Robert Tomek
  • Project Manager
    • Emily Townley

📖 References

[1] Whitney HM, Baughan N, Myers KJ, Drukker K, Gichoya J, Bower B, Chen W, Gruszauskas N, Kalpathy-Cramer J, Koyejo S, Sá RC, Sahiner B, Zhang Z, Giger ML. Longitudinal assessment of demographic representativeness in the Medical Imaging and Data Resource Center open data commons. J Med Imaging (Bellingham). 2023 Nov;10(6):61105. doi: 10.1117/1.JMI.10.6.061105. Epub 2023 Jul 18. PMID: 37469387; PMCID: PMC10353566.

📫 Contribute

  1. git clone https://github.com/MIDRC/MIDRC_Diversity_Calculator.git
  2. git checkout -b feature/NAME
  3. Open a Pull Request explaining the problem solved or feature made, if exists, append screenshot of visual modifications and wait for the review!

Documentations that might help

📝 How to create a Pull Request

✔️ License

This project is licensed with the Apache 2.0 license. See LICENSE file for details.

Releases

No releases published

Packages

No packages published