README

Overview

This project implements a text generation model using the GPT-2 architecture. It loads text data from multiple folders, preprocesses the data, trains a custom model, and allows for generating responses based on user input.

Features

Load and preprocess text data from multiple directories.
Train a GPT-2 model on the custom dataset.
Generate text responses based on user prompts.
Save the trained model and tokenizer for future use.

Requirements

Python 3.6 or higher and less than 3.13
PyTorch
Transformers library from Hugging Face

Installation

To set up the environment, you can use the following commands:

pip install torch  
pip install transformers

Usage

Step 1: Load Data

The script loads text data from a specified root folder containing subfolders. Update the root_folder variable in config.py to point to your data directory.

Step 2: Preprocess Data

The loaded text data is preprocessed by joining all texts into a single string, separated by newlines.

Step 3: Build Model

The script initializes the GPT-2 tokenizer and model, setting the padding token to be the same as the end-of-sequence token.

Step 4: Train Model

The model is trained on the preprocessed text data. You can specify the number of epochs and batch size in the train_model function.

Step 5: Save Model

After training, the model and tokenizer are saved to a specified directory.

Step 6: Generate Responses

The script allows for interactive text generation. You can input prompts, and the model will generate responses until you type 'end' to exit.

Example

To run the script, execute the following command in your terminal:

python MultipleFiles/Q&A.py

After the training part, you will be able to discuss with him. Don't forget it's a Home made IA and it's not very powerfull like CHATGPT or Mistral, and the result can change a lot depending of the training data.

Contributing

Contributions are welcome! Please feel free to submit a pull request or open an issue for any suggestions or improvements.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Acknowledgments

Hugging Face Transformers for providing the pre-trained models and tokenizers.
PyTorch for the deep learning framework.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.gitignore		.gitignore
LICENSE		LICENSE
Q&A.py		Q&A.py
README.md		README.md
config.py		config.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

README

Overview

Features

Requirements

Installation

Usage

Step 1: Load Data

Step 2: Preprocess Data

Step 3: Build Model

Step 4: Train Model

Step 5: Save Model

Step 6: Generate Responses

Example

Contributing

License

Acknowledgments

About

Releases

Packages

Languages

License

graniteiin/gpt2-text-generation

Folders and files

Latest commit

History

Repository files navigation

README

Overview

Features

Requirements

Installation

Usage

Step 1: Load Data

Step 2: Preprocess Data

Step 3: Build Model

Step 4: Train Model

Step 5: Save Model

Step 6: Generate Responses

Example

Contributing

License

Acknowledgments

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages