Name		Name	Last commit message	Last commit date
parent directory ..
States-Grouping_Assault-Prediction_files/figure-gfm		States-Grouping_Assault-Prediction_files/figure-gfm
US_Arrest_Data		US_Arrest_Data
README.md		README.md
States-Grouping_Assault-Prediction.Rmd		States-Grouping_Assault-Prediction.Rmd
States-Grouping_Assault-Prediction.md		States-Grouping_Assault-Prediction.md

README.md

US States Grouping and Assault Rate Prediction

Date: 2024-09-20

Introduction

Brief Overview:
This project analyses socio-economic and crime data from various datasets to group US states based on their characteristics and identify factors that significantly predict assault rates. It was undertaken as part of a technical assessment for a job application at Singapore public agency.

The goal was to demonstrate analytical and statistical skills, including clustering and regression analysis, in the context of real-world data challenges. Insights from this project can aid in understanding socio-economic patterns and crime trends across the US.

Objectives:

Group US states based on socio-economic and crime characteristics using clustering analysis.
Identify significant predictors of assault rates through regression models.
Provide data-driven insights to better understand socio-economic disparities and crime rates.

Installation

Prerequisites:

R version 4.0+
RStudio

Dependencies: Install the required R libraries by running the following in R:

install.packages(c("tidyverse", "cluster", "factoextra", "broom", "car", "ggcorrplot", "usmap"))

Usage

Instructions:

Access States-Grouping_Assault-Prediction.md to view the pre-run analysis.

or
Download the folder, States_Grouping_Assault_Prediction.
Open States_Grouping_Assault_Prediction.Rmd in RStudio.
Knit the file to generate an HTML report. Alternatively, run the code chunks sequentially to reproduce the analysis.

Data

Data Sources:

USArrest.csv: Contains crime data, including assault rates, for US states.
USstatex77.csv: Includes socio-economic indicators for US states.

Data Description:

USArrest.csv: Columns include Murder, Assault, UrbanPop, and Rape.
USstatex77.csv: Contains indicators like Population, Income, Illiteracy, and Life Expectancy.

Data Processing:

Merged multiple datasets on the State column.
Normalised numeric variables for clustering.
Handled missing data and multicollinearity issues for regression analysis.

Methodology

Techniques Used:

Clustering Analysis: Grouped states using K-means clustering with 4 clusters.
Regression Analysis: Identified significant predictors of assault rates using multiple linear regression.

Tools:

Programming Languages: R
Libraries: tidyverse, cluster, factoextra, ggcorrplot, car, usmap

Results

Findings:

Cluster Profiles: States were grouped into clusters based on crime rates, income, urbanization, and education.
Regression Insights: Higher urban population, higher rape rates, and lower life expectancy were associated with higher assault rates.

Visualisations:

Cluster Visualization: Principal Component Analysis (PCA) plot showing clusters.
Correlation Matrix: Displays relationships among variables.

Interpretation:

Policies targeting urban population management and crime reduction can potentially mitigate assault rates.

Project Structure

Directory Tree:

States-Grouping_Assault-Prediction/
│
├── README.md                                              # Project overview and instructions
├── States-Grouping_Assault-Prediction.md                  # Main analysis report
├── States-Grouping_Assault-Prediction.Rmd                 # Main analysis script
├── US_Arrest_Data/                                        # Data files (e.g., USArrest.csv, USstatex77.csv)
└── States-Grouping_Assault-Prediction_files/figure-gfm/   # Visuals used in the report

Key Files:

States-Grouping_Assault-Prediction.md: The main Markdown report with analysis, viewable directly on GitHub.
States-Grouping_Assault-Prediction.Rmd: The main R Markdown file containing the analysis.
US_Arrest_Data/: Folder containing the input datasets.
States-Grouping_Assault-Prediction_files/figure-gfm/: Folder to store generated outputs (e.g., reports, tables).

License

This project is intended for submission as part of a technical assessment. The content and code are not intended for public distribution, reproduction, or commercial use without explicit permission from the Author.

Contact Information

Author: Ou Yang Yu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

States-Grouping_Assault-Prediction

States-Grouping_Assault-Prediction

README.md

US States Grouping and Assault Rate Prediction

Date: 2024-09-20

Introduction

Table of Contents

Installation

Usage

Data

Methodology

Results

Project Structure

License

Contact Information

Files

States-Grouping_Assault-Prediction

Directory actions

More options

Directory actions

More options

Latest commit

History

States-Grouping_Assault-Prediction

Folders and files

parent directory

README.md

US States Grouping and Assault Rate Prediction

Date: 2024-09-20

Introduction

Table of Contents

Installation

Usage

Data

Methodology

Results

Project Structure

License

Contact Information