Microsoft-Malware-Prediction-Using-CatBoost

Link to Kaggle: https://www.kaggle.com/c/microsoft-malware-prediction

Introduction

The goal is to predict a Windows machine’s probability of getting infected by various families of malware using CatBoost based on different properties of that machine generated by telemetry data of Windows Defender.

Model Description

CatBoost is a state-of-the-art open-source gradient boosting on decision trees library. Developed by Yandex researchers and engineers, it is the successor of the MatrixNet algorithm that is widely used within the company for ranking tasks, forecasting and making recommendations. It is universal and can be applied across a wide range of areas and to a variety of problems.

Accurate: leads or ties competition on standard benchmarks
Robust: reduces the need for extensive hyperparameter tuning
Easy-to-use: offers Python interfaces integrated with scikit, as well as R and command-line interfaces
Practical: uses categorical features directly and scalably
Extensible: allows specifying custom loss functions

Project Workflow

Download the dataset
Clean the dataset
Perform feature engineering on the dataset
Encode the dataset
Fit the model to the training dataset
Find the accuracy and evaluation metrics using test(validation) dataset

Results & Conclusion

Since this is a Kaggle competition, the output of the model is evaluated by Kaggle. There are two leaderboards in Kaggle namely public and private leaderboard. The private leaderboard is calculated with approximately 37% of the test data for this competition. This leaderboard is calculated with approximately 63% of the test data.

Private Score: 0.64949
Public Score: 0.65380

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
MMP_Data_Encoding.ipynb		MMP_Data_Encoding.ipynb
MMP_Modeling.ipynb		MMP_Modeling.ipynb
MMP_Test_Data_Feature_Engg.ipynb		MMP_Test_Data_Feature_Engg.ipynb
MMP_Train_Data_Feature_Engg.ipynb		MMP_Train_Data_Feature_Engg.ipynb
MMP_clean_test_data.ipynb		MMP_clean_test_data.ipynb
MMP_clean_train_data.ipynb		MMP_clean_train_data.ipynb
MMP_download_dataset.ipynb		MMP_download_dataset.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Microsoft-Malware-Prediction-Using-CatBoost

Introduction

Model Description

Project Workflow

Results & Conclusion

About

Releases

Packages

Languages

tsaiaditya/Microsoft-Malware-Prediction-Using-CatBoost

Folders and files

Latest commit

History

Repository files navigation

Microsoft-Malware-Prediction-Using-CatBoost

Introduction

Model Description

Project Workflow

Results & Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages