This repository contains a series of Jupyter notebooks for different NLP projects, focusing on text regression, text generation using transformers, and fine-tuning a BERT model.
This project involves scraping text data from Arabic websites on a specific topic, preprocessing the text, and training various RNN-based models to predict a relevance score for each text.
- Data Collection: Using Scrapy or BeautifulSoup to scrape text data from Arabic websites.
- Dataset Preparation: Creating a dataset with two columns:
Text
(Arabic text) andScore
(relevance score between 0 to 10). - NLP Pipeline: Preprocessing the text data (tokenization, stemming, lemmatization, removing stop words, discretization).
- Model Training: Training RNN, Bidirectional RNN, GRU, and LSTM models with hyperparameter tuning.
- Evaluation: Evaluating models using standard metrics and BLEU score.
This project focuses on fine-tuning a pre-trained GPT-2 model using a custom dataset and generating new text based on a given sentence.
- Installation: Installing
pytorch-transformers
. - Model Loading: Loading the pre-trained GPT-2 model.
- Fine-Tuning: Fine-tuning the GPT-2 model on a custom dataset.
- Text Generation: Generating new paragraphs based on a given input sentence.
Follow the tutorial here for detailed steps on fine-tuning GPT-2.
This project involves using the pre-trained bert-base-uncased
model for text classification tasks using a dataset from Amazon reviews.
- Data Preparation: Downloading and preparing the dataset from Amazon Reviews.
- Model Setup: Setting up the BERT embedding layer.
- Fine-Tuning: Fine-tuning the BERT model with appropriate hyperparameters.
- Evaluation: Evaluating the model using metrics like Accuracy, Loss, F1 score, BLEU score, and BERT-specific metrics.
- Conclusion: Summarizing the performance and insights from using the pre-trained BERT model.
- Text Regression: Scraped from various Arabic websites using Scrapy or BeautifulSoup.
- Text Generation: Dataset
- BERT Model: Amazon Reviews Dataset.
- Project supervised by Pr. Elaachak Lotfi at Université Abdelmalek Essaadi, Faculté des Sciences et Techniques de Tanger, Département Génie Informatique.
- Inspired by various tutorials and open-source projects in the NLP community.