The goal of this project is to practice formulating questions and implementing Machine Learning techniques to answer those questions, potentially using NLP (Natural Language Processing), Supervised and Unsupervised ML.
The data is from OKCupid, an app that focuses on using multiple choice and short answers to match users. It was provided by Codecademy for the final project in the Machine Learning portion of the Data Scientist career path, in profiles.csv
.
What analysis will we perform to acheive the project goals?
-
We can explore and visualize the data - What type of person uses OKCupid and how are they able to describe themselves within the parameters of the site?
-
Can we use Machine Learning to fill in missing data? If we were using a dating app, what information about potential partners would be most important personally? I might have allergies, or really love animals, and want to know if they had pets.
We can wrap up by evaluating how well the model performed - Does it indicate we can answer the question(s) with any confidence?