Skip to content

Extremesarova/yandex_nlp_course

Repository files navigation

Contents

  • Week 1 - Embeddings
  • Week 2 - Classification
    • Large scale text analysis with deep learning
    • Prohibited Comment Classification
      Tackle this problem using both classical NLP methods and embedding-based approach:
      • BOW from scratch implementation
      • TF-IDF from scratch implementation
      • Naive Bayes from scratch
      • TF-IDF + Logistic Regression + Hyperparameter Grid Search
      • FastText embeddings + Logistic Regression + Hyperparameter Grid Search
    • Salary prediction
      The task is to predict salary based on the different text and categorical features:
      • Exploratory Data Analysis
        • Categorical Columns Encoding
        • Target transformation
    • Modeling:
      • Baseline: Custom PyTorch dataset + Custom Transforms + Fusion model (Title Encoder + Description Encoder + Categorical Encoder )
      • Improved model: In progress
    • Explaining model predictions: In progress
  • Week 3 - Language Modeling
  • Week 4 - Seq2Seq

To-do

  • week 2 (Text Classification):
    • Practice:
      • Homework part 2 - in progress
    • Theory:
      • Analysis and Interpretability
      • Research Thinking
      • Related Papers
  • week 3 (Language Modeling):
    • Practice:
      • Seminar - Fix Kneser-Ney smoothing
        Look here at the bottom of the page for reference formula
      • Homework - Implement Beam Search + Ultimate LM
  • week 4 (Seq2Seq)