Skip to content

Latest commit

 

History

History
65 lines (44 loc) · 3.29 KB

File metadata and controls

65 lines (44 loc) · 3.29 KB

Kaggle-Competition-TitanicSurvival

Predict whether a passenger on the titanic would survive or not.


Project Motivation

This is an attempt to capture my approach/analysis to the Kaggle competition, Titanic Machine Learning From Disaster. I wanted to explore the data to understand the factors impacting the survival and see whether i could predict the survival using machine learning models:

  1. Does port of embarkation impact the survival of the passenger?
  2. Does gender have any bearing on the survival of the passenger?
  3. Does traveling with relatives improve the chance of survival?

The goal of this repository is to provide motivation for those interested in getting into the field of data analytics or using python for Kaggle's Data Science competitions .

Kaggle Competition | Titanic Machine Learning from Disaster

The sinking of the RMS Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the Titanic sank after colliding with an iceberg, killing 1502 out of 2224 passengers and crew. This sensational tragedy shocked the international community and led to better safety regulations for ships.

One of the reasons that the shipwreck led to such loss of life was that there were not enough lifeboats for the passengers and crew. Although there was some element of luck involved in surviving the sinking, some groups of people were more likely to survive than others, such as women, children, and the upper-class.

In this contest, we ask you to complete the analysis of what sorts of people were likely to survive. In particular, we ask you to apply the tools of machine learning to predict which passengers survived the tragedy.

This Kaggle Getting Started Competition provides an ideal starting place for people who may not have a lot of experience in data science and machine learning."

From the competition homepage.

File Description

There is 1 notebook available here to showcase work related to the above questions and the additional modelling work. There are also two datasets, one for training and one for testing, taken from the Kaggle competetion page. Markdown cells were used to assist in walking through the thought process for individual steps.

Dependencies:

Exploratory Data Analysis

  • Importing Data with Pandas
  • Exploring Data through Visualizations with Matplotlib

Data Preprocessing

  • Cleaning Data
  • Missing value treatment
  • Feature scaling
  • One-hot encoding

Model Building: Supervised Machine learning Techniques:

  • Logistic Regression Model
  • Random Forest
  • Gradient Boost
  • XGBoost
  • Bagging Classifier

Results

The results and findings can be found in the following blog posts:

Competition Website: http://www.kaggle.com/c/titanic-gettingStarted