Skip to content

Latest commit

 

History

History
27 lines (23 loc) · 920 Bytes

File metadata and controls

27 lines (23 loc) · 920 Bytes

Data mining project

Project developed for the undergraduate elective course "Data mining and Machine Learning" at CEID.

Part A

Dataset: winequality-red.csv

  • Suport Vector Machines (SVM)
  • Missing data handling
    1. Drop column
    2. Fill NaN values with column average
    3. Logistic Regression imputation
    4. Imputation based on K-means

Evaluation metrics: f1 score, precision, recall and accuracy

Part B

Dataset: onion-or-not.csv

  • Data preprocessing (NLTK)
    1. Word tokenizer
    2. Stemming
    3. Stopwords removal
    4. Tf-idf matrix
  • Neural network (Tensorflow keras)

Evaluation metrics: f1 score, precision, recall and accuracy

Authors

  • Zisis Stylianos Tramparis
  • Romanos Kapsalis