Skip to content

ISIS 4822: Visual Analytics Fall 2018 - Final Project - Rheumatoid Arthritis in Colombia

License

Notifications You must be signed in to change notification settings

dersteppenwolf/isis4822_final_project

Repository files navigation

Dashboard to analyze Rheumatoid Arthritis in Colombia based on Costs per Person

Final Project for Class ISIS 4822 - Visual Analytics - Universidad de los Andes http://johnguerra.co/classes/visual_analytics_fall_2018/

alt text

Main Links

Description

Rheumatoid arthritis (RA) is an autoimmune disease that can cause joint pain and damage throughout your body. There's no cure for RA, but there are treatments that can help you to manage it. In addition to to physical and emotional pain, the economic costs associated are high. In general, it is considered as a high-cost disease.

 Main Goal of the Project

The following work tries to bring a visual analytics tool that could help to understand the impact of Rheumatoid Arthritis (RA) in Colombia in terms of the economical costs associated with it. The cost of procedures vary from state, regime, age, administrator, provider, etc. Using a visual tool could help the experts to explore and understand the available data.

Costs of procedures related to RA are extracted from SISPRO ( Sistema Integral de Información de la Protección Social ) in the period from 2010 to 2017.

 Justification

  • In recent years there has been an increase in the prevalence of the disease in the country.
  • At the moment there are not enough available tools that allow to explore RA data in a user friendly way.

About the Author

Juan Carlos Méndez

Email: jc.mendez[at]uniandes.edu.co , juan[at]gkudos.com

Twitter: @dersteppen

Github: https://github.com/dersteppenwolf

Web: https://neogeografia.wordpress.com/

Who?

This visualization is intended for Physicians and Health professional interested in occurrence of Rheumatoid arthritis (RA) in Colombia.

What?

Data

  • Main Dataset: SISPRO
  • Description: Administrative Database with Medical services given to patients in Colombian health system, filtered by Diagnostic codes for Rheumatoid Arthritis.
  • Source: SISPRO
  • Source Type: Microsoft analysis services data cube Cube: CU - Prestación Servicios de Salud
  • Dataset Type: Table, Temporal
  • Attributes
    • States : Categorical. States of Colombia
    • Year: Categorical, orderered, sequential. Year of the procedure.
    • Regime : Categorical. Type of health regime to which the patient belongs
    • Sisben : Categorical. Subtype of Subsidized regime.
    • Sex : Categorical.
    • Age Group: Categorical, ordered. Age Groups classified by human cycle.
    • Administrator: Categorical. Administrators of the Social Security System
    • Provider: Categorical. Company that provides a medical service.
    • Procedure: Categorical. Procedures and medical services performed in Colombia
    • Procedure Cost: Quantitative, ordered, sequential. Cost of a procedure applied to a patient.
    • People Served : Quantitative, ordered, sequential. Number of people

Derived

Categorical

Most categorical attributes from raw data are long strings that repeat many times. The size of original CSV file is 456 MB. Such kind of "big" file can generate a lot of latency during downloads for "normal" web clients. To avoid that kind of problem, data is derived in two files:

  • A lookup table (domains.json - 1 MB)
  • Encoded rows (costs.tsv - 20.4 MB)

Encoding was made as follows:

  • Extract ids / codes from original strings for States, Administrators, providers and procedures
  • Generate ids for regime, sisben, sex and age.

Geo

The geo data of the States used for the map (colombia_index.geojson) is a simplification of the original polygons from OSM. The derived file tries to reflect a "Grid Map" for Colombia that allows the user to easily identify a State for interactive widgets based on the bounding boxes of the 25k scale grid of Colombia. It was made with Postgis and QGIS.

geo

Why?

Main Task

Discover the distribution of Costs per Person (CP) of RA procedures in Colombia by state, year, regime, sisben, sex, age group, administrator and provider.

Secondary Tasks

  • Derive attributes from raw data as features to be used in the final visualization.
  • Identify Outliers in costs.
  • Identify the Features of a specific procedure in the dataset.
  • Summarize the distribution of Costs per Person, people served and total costs of RA procedures in Colombia.

How?

The dashboard uses different idioms / widgets with Different encodings for all data with Linked filtering (Crossfiltering).

Idiom : Horizontal / Vertical Bar Charts

Encode

  • Attributes: Year, State, Regime, Sisben, sex, age, administrator, provider
  • Mark: Line
  • Channel
    • Position: Key attribute. Horizontal / Vertical.
    • Color: Selection / Hover
  • Encode: Separate, Order, Align.

Manipulate

  • Select and Highlight: Click / Hover
  • Navigate: Attribute Reduction, Slice
  • Change with Animated Transitions

Facet

  • Juxtapose
  • Linked Filtering (Crossfiltering)

Reduce

  • Filter Items / Attributes
  • Aggregate Attributes

Idiom : List

Encode

  • Attributes: Procedure Name, Cups, Cost per Person, Persons attended, total costs. Total costs per person, total persons attended, total costs.
  • Mark: Area
  • Channel
    • Position: Vertical, Key attribute (Cups). Horizontal: other attributes
    • Color: Selection
  • Encode: Separate, Order

Manipulate

  • Select
  • Reorder

Facet

  • Juxtapose
  • Linked Filtering (Crossfiltering)

Idiom : Grid Map

Encode

  • Attributes: State
  • Mark: Area
  • Channel
    • Spatial Region
    • Color: Selection

Manipulate

  • Select

Facet

  • Juxtapose
  • Linked Filtering (Crossfiltering)

Insights

  • There are data quality problems like these:
    • Some states do not have data for one or more years (e.g Amazonas, Arauca, Casanare, Guainía, Guaviare)
    • There are "Not Reported - NR" and "Not Available - NA", values in some of the attributes. Such kind of "data loss" problem should be mitigated by data publishers in order to improve the general data quality of the dataset.
    • The expert found different NIT providers with the same name.
  • The overall Costs per Person (C/P) for procedures is higher for younger people.
  • The overall C/P is higher for well known isolated states like Chocó, Guainía, La Guajira , Putumayo , Arauca, Vaupés and Vichada.
  • For every year of the dataset Córdoba is the State with the highest C/P.
  • There are anomalies in procedure costs that reflect problems during data collection (e.g. procedures with cost of $1 COP)
  • According to the expert, the costs of some procedures in the database differ a lot from the official rates established for the country. ( SOAT DECRETO 2423 DE 1996] and its yearly updates)
  • There is a huge number of persons (65k) with RA that belong to the subsidized (subsidiado) regime. According to the expert, usually that group of people doesn't have access to the most advanced or modern procedures due to the high costs of them.
  • There are more women affected with the decease, but costs for men are higher.

Tech Stuff

Technologies / Apis used

Running the App

You only need it a web server enabled to serve static content e.g. Apache Web Server. Code and data files (html, css, js, json, tsv ) are included in the app folder.

If you use Python 3, you could run a simple http server using the following commands:

cd app
python3 -m http.server

Then you can open a web browser using the following url:

http://localhost:8000/

Source code

ETL

You can find the ETL's source coude in the etl folder. That folder includes a tableau workbook used as a tool for data extraction from SISPRO's Analysis Services Data Cube. (Note: It only works on windows machines. For more information about connect Tableau to a Microsoft Analysis Services database and set up a data source please read the official docs )

There are some Jupyter notebooks like costos_20181122.ipynb is used for data validation / transformation.

Web Application

You can find the source code in the app folder.

Subfolders:

  • css
  • data
  • img
  • js : Custom Source Code
  • js/libs : External Libraries
  • pages : Html templates for AngularJs

Relevant files:

  • index.html : Main html file
  • js/costsController.js : Controller of the Costs visualization view.
  • pages/costs.html : Html template for visualization
  • js/categoryChartDirective.js : Implementation of an AngularJS Directive using D3 to visualize categorical data as a customized horizontal barchart.
  • js/barChartDirective.js : Implementation of an AngularJS Directive using D3 to visualize categorical data as a customized barchart.
  • js/colombiaMapDirective.js : Implementation of an AngularJS Directive using D3 to visualize a simplified map of Colombia used as a tool for quick lookups for states .
  • js/dataTableChartDirective.js : Implementation of an AngularJS Directive using Angular UI Grid to visualize tabular data related to procedures costs.

Development

App source code:

cd app/

Dependencies:

npm install gulp
npm install

To ease local development you can use gulp for hot reloading:

gulp watch

Screenshots

alt text

alt text

alt text

alt text

alt text

alt text

alt text

alt text

Other Links / Docs

Crossfilter

Crossfilter is a JavaScript library for exploring large multivariate datasets in the browser.

Crossfilter2 ( https://github.com/crossfilter/crossfilter ) is a community-maintained fork of the original square/crossfilter ( https://github.com/square/crossfilter ) library.

Crossfilter2 can be a little confusing at the beginning. The following links include some useful programming examples to start with.

About

ISIS 4822: Visual Analytics Fall 2018 - Final Project - Rheumatoid Arthritis in Colombia

Resources

License

Stars

Watchers

Forks

Packages

No packages published