ShareGPT Builder is a versatile Gradio application that provides two key functionalities for training Language Learning Models (LLMs).
The application is designed to run locally, and submitted examples will be stored locally in the applications directory, but can also be served as a web application to anyone.
Firstly, it allows you to manually construct and store ShareGPT Formatted (SFT) conversations involving a system, a human, and GPT role or the Standard Formatted conversations involving a system, a user and an assistant. These conversations are automatically uploaded to huggingface.
For datasets using this format, refer to the Hermes 2.5 Dataset here.
Secondly, the application also includes a DPO Sample Builder. This feature enables the creation of sample comparison conversation responses, for Reinforcement Learning from Human Feedback (RLHF). This data gets automatically uploaded to the hub, and is in the Intel NeuralChat DPO format.
In this tab you can check all of your uploaded datasets, since the datasets are not uploaded in real time and there's an interval between the commits you will have to wait a little bit until the upload finishes as well as huggingface dataset viewer finished processing the newly commited data.
- Clone the repository:
git clone https://github.com/teknium1/sharegpt-builder.git
- Navigate to the project directory:
cd sharegpt-builder
- Install the required Python packages:
pip install -r requirements.txt
- login with your HuggingFace token with write access if you aren't already:
huggingface-cli login
- Run the Gradio application:
python app.py
-
Open your web browser and navigate to
http://127.0.0.1:7860/
. -
You will find 2 tabs, one for SFT and one for DPO, navigate to the one you want to contribute to and click there.
-
To add more turns to the conversation, fill the text field and press ↳ enter
-
After adding all the turns, click
save chat
to upload the conversation. -
The uploaded conversations can be viewed directly on the hub.
Contributions are welcome and greatly appreciated! Every little bit helps, and credit will always be given.
12/17/2024
: Thanks to not-lain for fixing sharegpt template as well as adding the dataset viewer tab12/13/2024
: Thanks to aldryss for updating the UI 🔥12/12/2024
: Thanks to not-lain for the help switching from flask to gradio and supporting automatic dataset upload 🔥
Here are ways to contribute:
- Check for open issues or open a fresh issue to start a discussion around a feature idea or a bug.
- Fork the repository on GitHub and start making your changes to a new branch.
- Write a test which shows that the bug was fixed or that the feature works as expected.
- Send a pull request and bug the maintainer until it gets merged and published.
Alternatively, you can contribute via submission of bugs or feature requests to the issues tab.
The application is set to run in debug mode. For production use, make sure to turn off debug mode in app.py
.
This project is licensed under the terms of the MIT license.