This project goes over data science theories and data preprocessing, leveraging the Titanic data set provided by Kaggle. The goal is to determine which passengers will likely survive or perish the monumental tragedy. The binary classification problem was addressed using two methods, each with three machine learning algorithms.
The first approach taken was a classical one where the training and testing sets were split manually. The second was to use the split data sets given without any unnecessary manipulations. Applied in both methods respectively, the Logit model (With and without gradient descent), Random Forests, and Support Vector Machines. Results showed that when we use the given partitioning, accuracy rates are close to 100%. In contrast, if we address the problem using the classical method we see an accuracy of approximately 85%.