We have included a dataset and task to provide a foundation for further discussion. This data was pulled from an online spam/ham corpus; the task is to create a process towards the best spam/ham classifier you can in around 2 hours. The goal isn’t to perfect every step but rather for you to show your thought processes; so spend your time accordingly with the evaluation criteria below in mind.
- Use Python in a Jupyter Notebook for all work
- Include any references for code that influenced your answer
- Use markdown & comments to help us understand your thought process
- Show us the code, outputs, and process for creating classifier
- Try to time-box this exercise to around 2 hours
- Be clear in your code and comments
- Going too in-depth will not win you bonus points
- In the last paragraph
- How do you think this model would do in the wild? and why?
- Explain where you would take the model building process if it had to be released into production
- Quality of code
- Clarity in reasoning
- Validity and performance of the model
- HTML output of your Jupyter notebook showing all of your work