Skip to content

igoldshm/Data-Science-London-Scikit-learn

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 

Repository files navigation

Data-Science-London-Scikit-learn

Objective

In this project, we are exploring Scikit-learn’s classification capabilities and accuracy estimation techniques using a synthetic dataset provided during a Data Science London meetup. The goal is to develop a binary classifier to categorize 9,000 objects, each represented by 40 numerical features composed of decimal values.

Dataset

This exercise utilizes a synthetic dataset with 40 features, representing objects from two distinct classes (labeled as 0 or 1). The training set consists of 1,000 samples, while the testing set contains 9,000 samples.

Model

The chosen architecture for this practice is the Random Forest classifier.

Accuracy estimation

Accuracy was calculated using the cross-validation method, with the training dataset divided into five folds.

Citation

Ben Hamner and Will Cukierski. Data Science London + Scikit-learn. https://kaggle.com/competitions/data-science-london-scikit-learn, 2013. Kaggle.

About

Practice of sklearn's classification abilities from the Data Science London meetup on Scikit-learn.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published