Dataset Creator

A user-friendly tool for creating conversation datasets to fine-tune AI/LLM models, with support for OpenAI's fine-tuning format.

🎯 Overview

Dataset Creator streamlines the process of preparing training datasets for language models. It provides an intuitive interface for creating, managing, and exporting conversation data in JSONL format - the standard format required for fine-tuning OpenAI models and other LLMs.

Why Dataset Creator?

Fine-tuning language models requires datasets in specific formats, which can be time-consuming to create manually. This app eliminates the formatting hassle, allowing you to focus on crafting high-quality training conversations.

✨ Features

🔄 Multi-turn Conversations: Create single or multi-turn dialogues with system messages
⚖️ Weight Control: Assign importance weights to assistant responses for training
✏️ Edit & Manage: Modify or delete existing conversations easily
💾 Persistent Storage: Auto-saves locally so you never lose your work
📤 Export to JSONL: One-click export in the format required for fine-tuning
🌐 Dual Mode: Run as a web app or build a standalone desktop application
🎨 Modern UI: Clean, intuitive interface with Bootstrap styling

🚀 Quick Start

Prerequisites

Python 3.7 or higher
Git (for cloning the repository)

Installation & Running

Clone the repository:

git clone https://github.com/rimom/DatasetCreator.git
cd dataset_maker

Run the application:

python run.py

That's it! The script will automatically:

✅ Set up a virtual environment
✅ Install all required dependencies
✅ Ask how you want to run the app

Choose Your Mode

When you run python run.py, you'll see:

============================================================
       Dataset Creator - AI Training Data Generator
============================================================

How would you like to run Dataset Creator?
1. Web Version (runs in browser)
2. Desktop App (build standalone application)
3. Exit

Option 1: Web Version

Opens in your default browser at http://localhost:5000
Perfect for quick use and development
No build process required
Press Ctrl+C to stop the server

Option 2: Desktop App

Builds a standalone application for your OS
Creates a native app (.app for macOS, .exe for Windows)
Takes a few minutes to build the first time
The app location will be displayed after building

📖 Usage Guide

Creating Conversations

System Message: Define the AI's role or behavior (e.g., "You are a helpful assistant")
User Message: Enter the user's input
Assistant Message: Enter the expected AI response
Weight: Check to include this response in training (unchecked = weight 0)

Multi-turn Conversations

Click "Add Message Pair" to create back-and-forth dialogues
Each pair represents one exchange in the conversation
Use the "Persist" checkbox to keep the system message for new conversations

Managing Data

Edit: Click the edit button next to any conversation to modify it
Delete: Remove individual conversations
Clear All: Reset the entire dataset
Export: Download your dataset as a .jsonl file

Keyboard Shortcuts

Enter in User Message field → Jump to Assistant Message
Enter in Assistant Message field → Save conversation
Shift+Enter in any field → New line

📸 Screenshots

Main interface showing conversation list

Adding a new conversation with message pairs

Exporting dataset to JSONL format

📁 Project Structure

dataset_maker/
├── run.py              # Main launcher script
├── app.py              # Flask application backend
├── requirements.txt    # Python dependencies
├── app.spec           # PyInstaller build configuration
├── templates/          # HTML templates
│   ├── base.html      # Base template
│   ├── index.html     # Main page
│   └── edit.html      # Edit conversation page
├── static/            # Frontend assets
│   ├── css/          # Stylesheets
│   └── js/           # JavaScript files
└── README.md          # This file

🔧 Troubleshooting

Common Issues

Python Version Error

python --version  # Should be 3.7 or higher

Virtual Environment Issues

rm -rf venv
python run.py  # Will recreate the environment

Manual Installation (if automatic setup fails)

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
pip install -r requirements.txt
python app.py

macOS Security Warning
- Right-click the app and select "Open"
- Click "Open" again in the security dialog

Data Storage

Web Mode: Data saved in the project directory as conversations.jsonl
Desktop App: Data saved in:
- macOS: ~/Library/Application Support/DatasetCreator/
- Windows: %APPDATA%\DatasetCreator\
- Linux: ~/.DatasetCreator/

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📚 Resources

📄 License

MIT License with Contribution Clause

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

Any person who forks, modifies, or creates derivative works based on this Software must submit a pull request to the original repository with their modifications, enhancements, or improvements, unless explicitly exempted in writing by the original author(s).
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES, OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT, OR OTHERWISE, ARISING FROM, OUT OF, OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Disclaimer: This is an independent project and is not officially affiliated with OpenAI.

Created with ❤️ for the AI community

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Dataset Creator

🎯 Overview

Why Dataset Creator?

✨ Features

🚀 Quick Start

Prerequisites

Installation & Running

Choose Your Mode

Option 1: Web Version

Option 2: Desktop App

📖 Usage Guide

Creating Conversations

Multi-turn Conversations

Managing Data

Keyboard Shortcuts

📸 Screenshots

📁 Project Structure

🔧 Troubleshooting

Common Issues

Data Storage

🤝 Contributing

Development Setup

📚 Resources

📄 License

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
static		static
templates		templates
.gitignore		.gitignore
README.md		README.md
app.py		app.py
app.spec		app.spec
image1.jpg		image1.jpg
image2.jpg		image2.jpg
image3.jpg		image3.jpg
requirements.txt		requirements.txt
run.py		run.py

rimomcosta/Dataset-Creator

Folders and files

Latest commit

History

Repository files navigation

Dataset Creator

🎯 Overview

Why Dataset Creator?

✨ Features

🚀 Quick Start

Prerequisites

Installation & Running

Choose Your Mode

Option 1: Web Version

Option 2: Desktop App

📖 Usage Guide

Creating Conversations

Multi-turn Conversations

Managing Data

Keyboard Shortcuts

📸 Screenshots

📁 Project Structure

🔧 Troubleshooting

Common Issues

Data Storage

🤝 Contributing

Development Setup

📚 Resources

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages