Skip to content

TitanicEclair/Demographic-Forecast

Repository files navigation

Demographic-Forecast

Welcome to Demographic-Forecast Repository

Contributors (FCSF TEAM8)

  • Ong Hong Xun
  • Grover Ekhnoor Kaur
  • Madelyn (Marvyn) Chiaching Cruz Tan

School of Computer Science and Engineering

Nanyang Technological University

Files present in the repostitory:

  1. Data Files:
    • demographic.csv
    • GDP by Country 1999-2022.csv
    • education.csv
    • healthcare.csv
    • 1900_2021disasters.csv
  2. Notebooks for Exploratory Data Analysis (EDA):
    • Demographic_initial.ipynb
    • EDA_Demographic_GDP.ipynb
    • EDA_Demographic_Government.ipynb
    • EDA_Demographic_Disaster.ipynb
  3. CSV Files Generated by EDA Notebooks:
    • Demographic_GDP.csv
    • Demographic_Government.csv
    • Demographic_Disaster.csv
  4. Data Combination Notebook:
    • Data_Combination.ipynb
  5. CSV File Generated by Data Combination Notebook:
    • Combined_Data.csv
  6. Notebook for Machine Learning (ML) Techniques and Evaluations:
    • ML_Techniques.ipynb

About

This is a Mini-Project for SC1015 (Introduction to Data Analysis and Artificial Intelligence). You have to only download the main 5 data files

  • demographic.csv
  • GDP by Country 1999-2022.csv
  • education.csv
  • healthcare.csv
  • 1900_2021disasters.csv
The rest are generated through the notebooks
Please follow through this order of notebooks:

Requirements

  1. pip install pmdarima (for Demographic_Initial.ipynb)
  2. pip install dash plotly (for Demographic_Initial.ipynb)

Problem Definition

  • In the modern world, population growth is influenced by a myriad of factors ranging from economic conditions to environmental events and governmental policies.
  • This understanding helps in crafting better-informed economic, environmental, and social policies that proactively manage resources and mitigate risks.
  • Checking to see which model is the best at predicting the dependency ratios through factors like gdp.

Models Used

  1. Linear Regression
  2. Random Forest Regression(RFR)
  3. Gradient Boosting Machine (GBM)
  4. ARIMA model for demographic prediction (not used in final evaluation)

Conclusion

  • Using the Random Forest (.feature_importances_), we were able to rank the best features for prediction. This can provide us insights on which indicator would potentially affect different demographics.
  • Comparable to the other 2 models (Linreg and RFR), GBM has a higher predictive accuracy but is also less robust to outliers, comparing to Random Forest.
  • Through the ml techniques we were able to predict the child dependency, elderly dependency ratios and total dependency ratios
  • This will help in enabling policy makers to adapt new strategies beforehand through the forecasted data leading to more resilient and sustainable communities

What did we learn from this project?

  1. We learnt new ML techniques such as Gradient Boosting Machine and Random Forest.
  2. We learnt how to incorperate and relate different kinds of data to a specific problem
  3. We learnt how to collaborate using github repositories
  4. We learnt how to use Dash to visualize the data

References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors