Skip to content

ramprasathasokan/Microsoft-Malware-Prediction-Using-CatBoost

Repository files navigation

Microsoft-Malware-Prediction-Using-CatBoost

Link to Kaggle: https://www.kaggle.com/c/microsoft-malware-prediction

Impact

Helps in protecting more than one billion machines from damage before it happens.

About

Predict a Windows machine’s probability of getting infected by various families of malware using CatBoost based on different properties of that machine generated by telemetry data of Windows Defender.

Tech Stack

Python, Jupyter, Pandas, NumPy, Dask and CatBoost.

Jupyter Notebooks Link

https://nbviewer.jupyter.org/github/ramprasathasokan/Microsoft-Malware-Prediction-Using-CatBoost/tree/master/

Model Description

CatBoost is a state-of-the-art open-source gradient boosting on decision trees library. Developed by Yandex researchers and engineers, it is the successor of the MatrixNet algorithm that is widely used within the company for ranking tasks, forecasting and making recommendations. It is universal and can be applied across a wide range of areas and to a variety of problems.

  • Accurate: leads or ties competition on standard benchmarks
  • Robust: reduces the need for extensive hyperparameter tuning
  • Easy-to-use: offers Python interfaces integrated with scikit, as well as R and command-line interfaces
  • Practical: uses categorical features directly and scalably
  • Extensible: allows specifying custom loss functions

Project Workflow

  1. Download the dataset
  2. Clean the dataset
  3. Perform feature engineering on the dataset
  4. Encode the dataset
  5. Fit the model to the training dataset
  6. Find the accuracy and evaluation metrics using test(validation) dataset

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published