LLM Fine-tuning Pipeline on Apple Silicon GPUs with MLX

This is an LLM fine-tuning pipeline for Apple Silicon GPUs using MLX. The project enables:

Data Collection & Preparation:

Web scraping with scripts/web_scraper.py to extract content from websites (like Wikipedia) Data preprocessing with scripts/prepare_jsonl_data.py to convert scraped CSV data into JSONL training format Model Training:

Fine-tuning the mlx-community/Ministral-8B-Instruct-2410-4bit model using LoRA (Low-Rank Adaptation) Training script scripts/train.sh with configurable parameters (batch size, iterations, learning rate) Testing capabilities via scripts/test.sh Key Features:

Optimized for Apple Silicon using MLX framework LoRA fine-tuning for efficient training with limited resources Multiple data formats supported (Q&A, instruction-following, chat) Automated pipeline from web scraping to model inference

Workflow:

Scrape web content → CSV
Convert CSV → JSONL training data
Fine-tune model with LoRA
Generate responses with the adapted model

The project is designed for creating domain-specific AI assistants by training on custom web content.

Usage

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

You need huggingface account and token to download the model.

hf auth login
hf download mlx-community/Ministral-8B-Instruct-2410-4bit

python ./scripts/web_scraper.py https://en.wikipedia.org/wiki/Yugoslavia -p 20 -o dataset/data.csv

python ./scripts/prepare_jsonl_data.py dataset/data.csv

./scripts/train.sh

./scripts/test.sh

Example inference:

./scripts/run.sh "Explain the history of the Balkans"

or

./scripts/run.sh "Who was the president of Yugoslavia?"

Sample

mlx-demo.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
adapters		adapters
data		data
dataset		dataset
scripts		scripts
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Fine-tuning Pipeline on Apple Silicon GPUs with MLX

Usage

Sample

About

Uh oh!

Releases

Packages

Languages

jkuri/mlx-llm-finetuning-example

Folders and files

Latest commit

History

Repository files navigation

LLM Fine-tuning Pipeline on Apple Silicon GPUs with MLX

Usage

Sample

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages