titanic_101

Uploaded as a reminder of where my Data Science studies started, here is my first crack at Titanic: Machine Learning from Disaster https://www.kaggle.com/c/titanic

Positives

It scores in the low eighties, along with most of the regular kaggle submissions, however it isn't Kaggle-compliant because of my favourite feature..
Impution of missing ages, by web scraping a titanic website (http://www.titanicfacts.net/), and using the Levenshtein distance to match inconsistences in the names across the data sources, removing a manual step. The results aren't perfect, but it's a cute proof of concept of how Data Science concepts, and a little creativity, can be used to solve the wrangling steps.

Negatives

Re-running the code, the biggest newbie mistake is up for debate, but I'd give it to using brute force to calculate the results of every combination of features. Running tens of thousands of logistic regression models is far, far less elegant than the variety of computational, or even manual, feature selection methods, however this was a learning exercise!
A consistent cross validation method really should have been used throughout, instead of, in some cases, a test train split and cross validation (!?)
I see a distinct lack of random state seeding, reducing the repeatability of results.
I could go on :)

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.gitignore		.gitignore
README.md		README.md
titanic_exercise.ipynb		titanic_exercise.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

titanic_101

Positives

Negatives

About

Releases

Packages

Languages

jhmostyn/titanic_101

Folders and files

Latest commit

History

Repository files navigation

titanic_101

Positives

Negatives

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages