Skip to content

LLM fine-tuning pipeline for Apple Silicon with MLX, web scraping, and LoRA training

Notifications You must be signed in to change notification settings

jkuri/mlx-llm-finetuning-example

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLM Fine-tuning Pipeline on Apple Silicon GPUs with MLX

This is an LLM fine-tuning pipeline for Apple Silicon GPUs using MLX. The project enables:

Data Collection & Preparation:

Web scraping with scripts/web_scraper.py to extract content from websites (like Wikipedia) Data preprocessing with scripts/prepare_jsonl_data.py to convert scraped CSV data into JSONL training format Model Training:

Fine-tuning the mlx-community/Ministral-8B-Instruct-2410-4bit model using LoRA (Low-Rank Adaptation) Training script scripts/train.sh with configurable parameters (batch size, iterations, learning rate) Testing capabilities via scripts/test.sh Key Features:

Optimized for Apple Silicon using MLX framework LoRA fine-tuning for efficient training with limited resources Multiple data formats supported (Q&A, instruction-following, chat) Automated pipeline from web scraping to model inference

Workflow:

  1. Scrape web content → CSV
  2. Convert CSV → JSONL training data
  3. Fine-tune model with LoRA
  4. Generate responses with the adapted model

The project is designed for creating domain-specific AI assistants by training on custom web content.

Usage

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

You need huggingface account and token to download the model.

hf auth login
hf download mlx-community/Ministral-8B-Instruct-2410-4bit
python ./scripts/web_scraper.py https://en.wikipedia.org/wiki/Yugoslavia -p 20 -o dataset/data.csv
python ./scripts/prepare_jsonl_data.py dataset/data.csv
./scripts/train.sh
./scripts/test.sh

Example inference:

./scripts/run.sh "Explain the history of the Balkans"

or

./scripts/run.sh "Who was the president of Yugoslavia?"

Sample

mlx-demo.mp4

About

LLM fine-tuning pipeline for Apple Silicon with MLX, web scraping, and LoRA training

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published