GitHub - doem97/ICLR26_mtLoRA: [ICLR 2026] Official implementation (Claude Agent reproduce supported) of paper "mtLoRA: Scalable Multi-Task Low-Rank Model Adaptation" +2.3% over SOTA with 47% fewer parameters

mtLoRA: Scalable Multi-Task Low-Rank Model Adaptation

🌟 ICLR 2026 🌟

Zichen Tian, Antoine Ledent, Qianru Sun

Singapore Management University

Official implementation of mtLoRA (multi-task LoRA) from the paper "Scalable Multi-Task Low-Rank Model Adaptation" (ICLR 2026). Scaling multi-task LoRA to many tasks (15–25+) causes catastrophic performance collapse (e.g., 88.2% → 2.0% accuracy). We identify two root causes — uniform regularization disrupts shared knowledge and component-level adaptation amplifies gradient conflicts — and propose three novel designs:

Spectral-Aware Regularization — Selectively orthogonalizes low-SV noise while preserving high-SV shared knowledge
Fine-Grained Routing — Dimension-specific routing weights instead of scalar weights per LoRA expert
Block-Level Adaptation — Applies LoRA as a parallel path at the block level, bypassing conflict-amplifying non-linearities

(A) Block-Level Adaptation bypasses internal non-linearities to mitigate gradient conflict.
(B) Fine-Grained Routing assigns dimension-specific weights for superior expressive power.

🤖 AI Agent Reproduction

One-click experiment reproduction powered by Claude Code. Open this project in Cursor or install the Claude Code CLI — the agent reads CLAUDE.md and handles environment setup, data download, and experiment execution automatically.

💬 "Help me reproduce Table 2 on my 2× L40 setup"
💬 "Set up the environment for my RTX 4090"
💬 "Run the BBH evaluation with spectral regularization λ=0.5"

✨ Highlights

+2.3% over SOTA across four large-scale benchmarks (15–27 tasks each) while using 47% fewer parameters and 24% less training time.

NLP results on LLaMA-2-7B (reproduced by this codebase):

Method	Dolly-15k → MMLU	Flan-v2 → BBH	Params
LoRAHub	42.0	34.9	75.5M (1.11%)
MMoELoRA	42.1	35.4	75.5M (1.11%)
HydraLoRA	42.4	36.9	75.5M (1.11%)
mtLoRA (Ours)	44.5	38.5	39.8M (0.59%)

Each design contributes meaningfully — block-level adaptation alone provides +2.1% with 50% fewer parameters:

Block-Level	Spectral Reg.	Fine-Grained Routing	Params	Dolly-15k	BBH
			75.5M (1.11%)	41.6	35.5
✓			37.7M (0.56%)	43.7	37.9
✓	✓		37.7M (0.56%)	43.6	38.4
✓		✓	39.8M (0.59%)	44.1	38.2
✓	✓	✓	39.8M (0.59%)	44.5	38.5

🚀 Getting Started

Requirements

Python 3.10+ | PyTorch 2.1+ | CUDA 11.8+
1–2 GPUs with ≥16 GB VRAM (for LLaMA-2-7B with DDP)

Installation

# Create environment
conda env create -f environment.yml
conda activate mtlora

# Install our custom PEFT library
pip install -e ./peft

Blackwell GPUs (CUDA 12.4+)

conda env create -f environment_cu124.yml
conda activate mtlora
pip install torch==2.5.1 --index-url https://download.pytorch.org/whl/cu124
pip install -e ./peft

Data Preparation

Base Model — Symlink LLaMA-2-7B (required for all experiments):

ln -s /path/to/llama-2-7b ./data/llama-2-7b
ln -s /path/to/llama-2-13b ./data/llama-2-13b   # Only needed for Table S7

Training Data — Download from Hugging Face:

Setup	Training Data	Evaluation	HF Source
BBH	Flan-v2 subset (30k examples)	BBH 3-shot (27 tasks)	`Muennighoff/flan`
MMLU	Dolly-15K (instruction tuning)	MMLU 5-shot (57 subjects)	`databricks/databricks-dolly-15k`

Evaluation datasets (data/bbh/ and data/mmlu_dataset/) are already included.

🔧 Reproduce Paper Results

Main Tables

Script	Paper Reference	Description
`bash tables/0_main_ablation.sh`	Table 2	Contribution of each key design
`bash tables/1_routing_granularity.sh`	Table 3	Routing granularity ablation
`bash tables/2_block_level.sh`	Table 4	Block-level adaptation ablation
`bash tables/3_llama13b.sh`	Table S7	LLaMA-2-13B scalability

Each script runs both BBH and MMLU experiments end-to-end (training + evaluation).

Custom Experiments

BBH Setup — Train on Flan-v2, evaluate on BBH (3-shot)

# Train
python train.py \
    --method mtlora \
    --model_name_or_path ./data/llama-2-7b \
    --dataset_dir ./data/flan_v2_subset \
    --output_dir ./output/custom_bbh \
    --lora_rank 16 --lora_nums 16 --enable_blc \
    --enable_block_adapter --block_adapter_type ffn \
    --enable_spectral_reg --spectral_reg_lambda 1.0 \
    --enable_fine_grained_routing --routing_group_size 2048 \
    --bf16 --num_train_epochs 1

# Evaluate
python eval_bbh.py \
    --model_name_or_path ./data/llama-2-7b \
    --lora_checkpoint ./output/custom_bbh/sft_lora_model \
    --output_dir ./output/custom_bbh/bbh_eval \
    --num_few_shot 3

MMLU Setup — Train on Dolly-15K, evaluate on MMLU (5-shot)

# Train
python train.py \
    --method mtlora \
    --model_name_or_path ./data/llama-2-7b \
    --dataset_dir ./data/dolly-15k-converted \
    --output_dir ./output/custom_mmlu \
    --lora_rank 16 --lora_nums 16 --enable_blc \
    --enable_block_adapter --block_adapter_type ffn \
    --enable_spectral_reg --spectral_reg_lambda 0.5 \
    --enable_fine_grained_routing --routing_group_size 2048 \
    --bf16 --num_train_epochs 1

# Evaluate
python eval_mmlu.py \
    --model_name_or_path ./data/llama-2-7b \
    --lora_checkpoint ./output/custom_mmlu/sft_lora_model \
    --output_dir ./output/custom_mmlu/mmlu_5shot \
    --num_few_shot 5 \
    --mmlu_data_dir ./data/mmlu_dataset

Analysis Figures

Scripts for reproducing paper figures are in tables/analysis/:

Script	Paper Figure	Content
`fig1a_routing_entropy.ipynb`	Figure 1(A)	Regularization–routing trade-off
`fig1b_spectral_conflict.ipynb`	Figure 1(B)	Spectral conflict analysis
`figS2_sv_spectrum.py`	Figure S2	SV spectrum visualization
`figS3_gradient_perlayer.py`	Figure S3	Per-layer gradient correlation
`figS4_routing_pattern.py`	Figure S4	Routing weight patterns

💡 Method Overview

Multi-task LoRA suffers from a fundamental regularization–routing trade-off: strengthening regularization to reduce inter-task conflict inadvertently suppresses routing effectiveness. We trace this to two root causes and propose targeted solutions:

(A) Regularization-routing trade-off. (B) Shared knowledge concentrates in high-SV components. (C) Block-level adaptation reduces gradient conflict by 76%.

Design	Root Cause Addressed	Key Idea
🎯 Spectral-Aware Reg.	Uniform regularization disrupts shared knowledge	Weight by `w(σ)=exp(−σ/σ̄)`: orthogonalize low-SV noise, preserve high-SV signal
🔀 Fine-Grained Routing	Scalar routing ignores dimension heterogeneity	Router MLP outputs per-dimension weights `Πᵢ ∈ ℝᵍ` instead of scalars `πᵢ ∈ ℝ`
🧱 Block-Level Adaptation	Component-level LoRA amplifies gradient conflicts	Parallel adapter path bypasses Softmax: `x' = x + F(LN(x)) + Δ(LN(x))`

Overall architecture of mtLoRA. The mtLoRA module (right) is attached as a parallel path after each LayerNorm. A router MLP generates dimension-specific weights to dynamically compose task experts.

⚙️ Configuration Reference

Method Selection

--method lora          # Standard single LoRA
--method hydralora     # HydraLoRA baseline (multi-expert, no mtLoRA extensions)
--method mtlora        # Full mtLoRA (block adapter + spectral reg + FGR)

mtLoRA Components

# Block-Level Adaptation
--enable_block_adapter              # Enable block-level instead of component-level
--block_adapter_type ffn            # Options: attention, ffn, both
--block_adapter_style lowrank

# Spectral-Aware Regularization
--enable_spectral_reg               # Enable spectral regularization
--spectral_reg_lambda 1.0           # Regularization strength
--spectral_reg_frequency 1          # SVD frequency (per epoch)

# Fine-Grained Routing
--enable_fine_grained_routing
--routing_group_size 2048           # Smaller = finer granularity (g = d/group_size)

Common Hyperparameters

--lora_rank 16                      # LoRA rank
--lora_alpha 64                     # LoRA alpha scaling
--learning_rate 0.0002
--per_device_train_batch_size 16
--num_train_epochs 1
--max_seq_length 512

Hardware Requirements

Experiment	GPU Memory	Recommended
LLaMA-7B (single GPU)	~24 GB	RTX PRO 6000
LLaMA-7B (DDP, 2 GPU)	~16 GB each	2× L40
LLaMA-13B	~48 GB	A100-80GB

For memory-constrained setups, reduce --per_device_train_batch_size and increase --gradient_accumulation_steps.

✍️ Citation

If you find this work useful, please consider citing our paper:

@inproceedings{tian2026mtlora,
    title     = {Scalable Multi-Task Low-Rank Model Adaptation},
    author    = {Tian, Zichen and Ledent, Antoine and Sun, Qianru},
    booktitle = {International Conference on Learning Representations (ICLR)},
    year      = {2026}
}

🙏 Acknowledgment

We gratefully acknowledge the support from the DSO research grant awarded by DSO National Laboratories, Singapore. This project is also partially supported by the Ministry of Education, Singapore, under its Tier-1 Academic Research Fund (No. 24-SIS-SMU-040). We thank the authors of HydraLoRA, MMoELoRA, and LoRAHub for their open-source implementations.

📄 License

This project is licensed under the Apache License 2.0.

_{Keywords: mtLoRA, multi-task LoRA, scalable multi-task LoRA, multi-task low-rank adaptation, parameter-efficient fine-tuning (PEFT), LoRA, low-rank adaptation, mixture of LoRA experts, LLaMA, LLM fine-tuning, spectral regularization, block-level adaptation, fine-grained routing, ICLR 2026}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mtLoRA: Scalable Multi-Task Low-Rank Model Adaptation

🌟 ICLR 2026 🌟

🤖 AI Agent Reproduction

✨ Highlights

🚀 Getting Started

Requirements

Installation

Data Preparation

🔧 Reproduce Paper Results

Main Tables

Custom Experiments

Analysis Figures

💡 Method Overview

⚙️ Configuration Reference

✍️ Citation

🙏 Acknowledgment

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
assets		assets
configs		configs
data		data
exp		exp
peft		peft
tables		tables
utils		utils
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CLAUDE.md		CLAUDE.md
README.md		README.md
environment.yml		environment.yml
environment_cu124.yml		environment_cu124.yml
eval_bbh.py		eval_bbh.py
eval_mmlu.py		eval_mmlu.py
requirements.txt		requirements.txt
train.py		train.py

Folders and files

Latest commit

History

Repository files navigation

mtLoRA: Scalable Multi-Task Low-Rank Model Adaptation

🌟 ICLR 2026 🌟

🤖 AI Agent Reproduction

✨ Highlights

🚀 Getting Started

Requirements

Installation

Data Preparation

🔧 Reproduce Paper Results

Main Tables

Custom Experiments

Analysis Figures

💡 Method Overview

⚙️ Configuration Reference

✍️ Citation

🙏 Acknowledgment

📄 License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages