NLP Projects: Regression, Text Generation, and BERT

This repository contains a series of Jupyter notebooks for different NLP projects, focusing on text regression, text generation using transformers, and fine-tuning a BERT model.

Text Regression

Description

This project involves scraping text data from Arabic websites on a specific topic, preprocessing the text, and training various RNN-based models to predict a relevance score for each text.

Steps

Data Collection: Using Scrapy or BeautifulSoup to scrape text data from Arabic websites.
Dataset Preparation: Creating a dataset with two columns: Text (Arabic text) and Score (relevance score between 0 to 10).
NLP Pipeline: Preprocessing the text data (tokenization, stemming, lemmatization, removing stop words, discretization).
Model Training: Training RNN, Bidirectional RNN, GRU, and LSTM models with hyperparameter tuning.
Evaluation: Evaluating models using standard metrics and BLEU score.

Transformer (Text Generation)

Description

This project focuses on fine-tuning a pre-trained GPT-2 model using a custom dataset and generating new text based on a given sentence.

Steps

Installation: Installing pytorch-transformers.
Model Loading: Loading the pre-trained GPT-2 model.
Fine-Tuning: Fine-tuning the GPT-2 model on a custom dataset.
Text Generation: Generating new paragraphs based on a given input sentence.

Tutorial

Follow the tutorial here for detailed steps on fine-tuning GPT-2.

BERT Model

Description

This project involves using the pre-trained bert-base-uncased model for text classification tasks using a dataset from Amazon reviews.

Steps

Data Preparation: Downloading and preparing the dataset from Amazon Reviews.
Model Setup: Setting up the BERT embedding layer.
Fine-Tuning: Fine-tuning the BERT model with appropriate hyperparameters.
Evaluation: Evaluating the model using metrics like Accuracy, Loss, F1 score, BLEU score, and BERT-specific metrics.
Conclusion: Summarizing the performance and insights from using the pre-trained BERT model.

Data Sources

Text Regression: Scraped from various Arabic websites using Scrapy or BeautifulSoup.
Text Generation: Dataset
BERT Model: Amazon Reviews Dataset.

Credits

Project supervised by Pr. Elaachak Lotfi at Université Abdelmalek Essaadi, Faculté des Sciences et Techniques de Tanger, Département Génie Informatique.
Inspired by various tutorials and open-source projects in the NLP community.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

NLP Projects: Regression, Text Generation, and BERT

Table of Contents

Text Regression

Description

Steps

Transformer (Text Generation)

Description

Steps

Tutorial

BERT Model

Description

Steps

Data Sources

Credits

Files

README.md

Latest commit

History

README.md

File metadata and controls

NLP Projects: Regression, Text Generation, and BERT

Table of Contents

Text Regression

Description

Steps

Transformer (Text Generation)

Description

Steps

Tutorial

BERT Model

Description

Steps

Data Sources

Credits