I have implemented common ML algorithms with Numpy, and verified the correctness of my implementations using sk-learn.
Inside each algorithm's code, you'll always find this 4-step experiment:
- load the relevant dataset
- solve the problem using sk-learn and 1.
- solve the problem using my own implementation and 1.
- assert that 2. and 3. are equal, by:
- comparing the accuracy on the entire dataset in a classification scenario.
- taking one sample from the dataset randomly and comparing the predictions in a regression scenario.
- For binary classification, I used the Pima Indians Diabetes.
- For regression, I used the Boston Housing Dataset.