- Ong Hong Xun
- Grover Ekhnoor Kaur
- Madelyn (Marvyn) Chiaching Cruz Tan
- Data Files:
- demographic.csv
- GDP by Country 1999-2022.csv
- education.csv
- healthcare.csv
- 1900_2021disasters.csv
- Notebooks for Exploratory Data Analysis (EDA):
- Demographic_initial.ipynb
- EDA_Demographic_GDP.ipynb
- EDA_Demographic_Government.ipynb
- EDA_Demographic_Disaster.ipynb
- CSV Files Generated by EDA Notebooks:
- Demographic_GDP.csv
- Demographic_Government.csv
- Demographic_Disaster.csv
- Data Combination Notebook:
- Data_Combination.ipynb
- CSV File Generated by Data Combination Notebook:
- Combined_Data.csv
- Notebook for Machine Learning (ML) Techniques and Evaluations:
- ML_Techniques.ipynb
This is a Mini-Project for SC1015 (Introduction to Data Analysis and Artificial Intelligence). You have to only download the main 5 data files
- demographic.csv
- GDP by Country 1999-2022.csv
- education.csv
- healthcare.csv
- 1900_2021disasters.csv
- Demographic_initial.ipynb
- EDA_Demographic_GDP.ipynb
- EDA_Demographic_Government.ipynb
- EDA_Demographic_Disaster.ipynb
- Data_Combination.ipynb
- ML_Techniques.ipynb
- pip install pmdarima (for Demographic_Initial.ipynb)
- pip install dash plotly (for Demographic_Initial.ipynb)
- In the modern world, population growth is influenced by a myriad of factors ranging from economic conditions to environmental events and governmental policies.
- This understanding helps in crafting better-informed economic, environmental, and social policies that proactively manage resources and mitigate risks.
- Checking to see which model is the best at predicting the dependency ratios through factors like gdp.
- Linear Regression
- Random Forest Regression(RFR)
- Gradient Boosting Machine (GBM)
- ARIMA model for demographic prediction (not used in final evaluation)
- Using the Random Forest (.feature_importances_), we were able to rank the best features for prediction. This can provide us insights on which indicator would potentially affect different demographics.
- Comparable to the other 2 models (Linreg and RFR), GBM has a higher predictive accuracy but is also less robust to outliers, comparing to Random Forest.
- Through the ml techniques we were able to predict the child dependency, elderly dependency ratios and total dependency ratios
- This will help in enabling policy makers to adapt new strategies beforehand through the forecasted data leading to more resilient and sustainable communities
- We learnt new ML techniques such as Gradient Boosting Machine and Random Forest.
- We learnt how to incorperate and relate different kinds of data to a specific problem
- We learnt how to collaborate using github repositories
- We learnt how to use Dash to visualize the data
- https://www.kaggle.com/datasets/willianoliveiragibin/progenitus
- https://www.kaggle.com/code/alejopaullier/gdp-by-country-1999-2022
- https://www.kaggle.com/datasets/brsdincer/all-natural-disasters-19002021-eosdis
- https://data.worldbank.org/indicator/SE.XPD.TOTL.GD.ZS?view=chart
- https://data.worldbank.org/indicator/SH.XPD.CHEX.GD.ZS?view=chart