Created by the University of Central Florida Team: Alejandra Alas, Ashley Smith, Joseph Fioresi, Kenneth Colón, Meleah Chase Malcolm, Quinn Barber, Ralph Balderamos III, and Sydney Damas
SOL Online is a mobile/web application designed to help KPMG experts analyze and interpret population data to identify the elements that correlate to financial and mental health inequity. We show some of the potential visualization options along with how the relevant datapoints were selected.
In all of these graphs, the y_axis statistics are 0-1 normalized.
This file is responsible for using pandas
and numpy
to clean and process CDC and ACS data together. The overall goal was to locate regions to initially launch our solution initiative. We were able to do this by conglomerating region data where all population, mental health problems, and poverty information is available/collected. We then used min-max scaling in order to normalize this data relative to all other regions. With these values we used exponential weighting to calculate our overall weighted score of locations to launch our initiative. We then min-max scaled this resulting score and display the results. Our findings for the best places to launch are the following:
- Tallahassee, Florida 32304
- Bronx, New York 10456
- Los Angeles, California 90011
- Brownsville, Texas 78521
- Brooklyn, New York 11212
This file is responsible for cleaning up the CDC data from CDC Places 2020 Health Outcomes.csv
and CDC Places Data Dictionary.xlsx
. It maps all the abbreviated data names to their corresponding names in the dictionary. Then we form a new data frame to represent locations based off of their City/State, Population, and Mental Health statistics. We sort this data by the most mental health issues for >= 14 days per 1000 adults >= 18 years old. This data is printed and stored in the clean_data
folder.
This file is responsible for cleaning all the ACS data that we were given. This is done by attributing the abbreviated data names to their dictionary values in ACS Data Dictionary.xlsx
. All this cleaned data is then stored in the clean_data
folder to be used by other programs.
This file is responsible for graphing and charting figures throughout the application. This includes but is not limited to figures in the prototype, construction of zip code data charts in the interactive component, and figures displaying throughout the README.md
.
This file is responsible for compiling all the relevant data that we were able to clean, pre-process, scale, and normalize. This allows the data to be ready for regression.py
, where machine learning techniques are used to determine correlations between different data points and increased poverty/mental heath issues. This file is saved in clean_data
as master.csv
.
This file is responsible for the actual regression model itself. Using master.csv
we run a Lasso
regression model on the data to determine correlation statistics. We save our results to figures/regression_variables.svg
.
The data
folder encompasses all the data that was provided by HSI. The only change made to the data was a conversion of the Zip Code Index.csv
file to Zip Code Index.xlsx
.
The clean_data
folder encompasses processed, cleaned, and resulting data from the above files.
This file is responsible for running and managing the Streamlit app which is available for anyone to scan the QR code and join. This QR code is available in the presentation slides, but the site can also be accessed from the link here.
This file is reponsible for creating our mock time series data in which we predict future trends in our initial launch locations. These figures can than be seen in the Streamlit app where you can customize selections for your viewing pleasure.
- Make sure to have Python 3.9.X or 3.10.X installed on your system along with
pip
cd
into your directory where you would like to clone our repository.- Use
git clone https://github.com/joefioresi718/hsi_bob.git
to clone our repository - Run
pip install -r requirements.txt
in your command line (command prompt
orpowershell
on Windows andterminal
on MAC) - Run the python files using compiler of your choice, see printed results and saved results
- (Optional) View statistics that have been pre-ran in
figures
andimages
folders.