Skip to content

Latest commit

 

History

History
37 lines (27 loc) · 2.04 KB

README.md

File metadata and controls

37 lines (27 loc) · 2.04 KB

Kaggle competition: Spaceship Titanic

For my second ML/data science project, I'm participating in Kaggle's Spaceship Titanic competition.

Data

We are provided multiple CSVs:

  • train.csv: personal records for about two-thirds (~8700) of the passengers, to be used as training data
  • test.csv: personal records for the remaining one-third (~4300) of the passengers, to be used as test data
  • sample_submission: a submission file in the correct format Which are all located in the data/ directory.

The one we use for training looks like this:

PassengerId,HomePlanet,CryoSleep,Cabin,Destination,Age,VIP,RoomService,FoodCourt,ShoppingMall,Spa,VRDeck,Name,Transported
0001_01,Europa,False,B/0/P,TRAPPIST-1e,39.0,False,0.0,0.0,0.0,0.0,0.0,Maham Ofracculy,False
0002_01,Earth,False,F/0/S,TRAPPIST-1e,24.0,False,109.0,9.0,25.0,549.0,44.0,Juanna Vines,True
(...)

The goal here is to predict the status of ~4k passengers in a spaceship based on ~8k passenger records.

Approach

Our first task is to load and preprocess the data to be able to feed it into our neural network for training. As we can see, there are lots of non-numeric data. We are going to perform feature encoding for each of the columns containing non-numerical data, and some feature engineering after that to improve model performance.

After that, I intend to build a multilayered feed-forward neural network using Pytorch to predict the outcome (the Transported column) for each of the passengers in the test.csv file.

How to run the code

Eveything is located in the Jupyter Notebook. To run it, follow the steps:

  • Clone the repository
  • Create a virtual environment (python3 -m venv .venv)
  • Activate the virtual enviroment (source .venv/bin/activate)
  • Install the dependencies (pip install -r requirements.txt)

After that, you can open and run the Jupyter notebook in your local IDE. Just make sure that it's running inside the virtual enviromnent (tutorial for VSCode here).