SLDAgent is an evolution-based AI agent that autonomously discovers scaling laws for large language models. This work introduces SLDBench, a comprehensive benchmark for this new scientific discovery task, and demonstrates that SLDAgent
can uncover laws that are more accurate and conceptually sound than their human-derived counterparts.
The agent co-optimizes both the symbolic formula of a scaling law and the parameter-fitting algorithm, enabling it to explore complex relationships and achieve superhuman performance in predicting model behavior at scale.
This project includes SLDBench, the first comprehensive benchmark for scaling law discovery, curated from over 5,000 LLM training experiments from existing literature.
Task key | Config file |
---|---|
parallel_scaling_law |
configs/parallel_scaling_law.yaml |
vocab_scaling_law |
configs/vocab_scaling_law.yaml |
sft_scaling_law |
configs/sft_scaling_law.yaml |
domain_mixture_scaling_law |
configs/domain_mixture_scaling_law.yaml |
moe_scaling_law |
configs/moe_scaling_law.yaml |
data_constrained_scaling_law |
configs/data_constrained_scaling_law.yaml |
lr_bsz_scaling_law |
configs/lr_bsz_scaling_law.yaml |
Data is centrally hosted on Hugging Face Hub at pkuHaowei/sldbench.
- Python 3.13+
uv
package manager (recommended)- An OpenAI-compatible API key (set
OPENAI_API_KEY
) - macOS/Linux/Windows
Note:
uv run
guarantees commands execute inside a synchronized project environment. If you prefer plainpip
, you can adapt the commands accordingly.
# 1) Clone the repo
git clone <repository-url>
cd SLD
# 2) Install dependencies
uv sync
# 3) Provide your LLM API key
export OPENAI_API_KEY=your_key
# Optional: if using a non-default endpoint
# export OPENAI_BASE_URL=https://your.openai.compatible.endpoint/v1
On Windows (PowerShell):
$env:OPENAI_API_KEY="your_key"
# $env:OPENAI_BASE_URL="https://your.openai.compatible.endpoint/v1"
# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
# Install dependencies
pip install -U pip
pip install -r requirements.txt # Or use pyproject.toml
# Set your API key
export OPENAI_API_KEY=your_key
# export OPENAI_BASE_URL=https://your.openai.compatible.endpoint/v1
Run a single discovery task (e.g., Data-Constrained):
EVAL_TASK_NAME="data_constrained_scaling_law" \
uv run openevolve-run.py \
--config configs/data_constrained_scaling_law.yaml \
init_program.py evaluator.py \
--output results/data_constrained_scaling_law/run_1
Or run all tasks in batch:
# If the script is executable:
uv run scripts/run.sh
# Otherwise:
bash scripts/run.sh
SLD/
ββ configs/ # βοΈ YAML configs (one per scaling law)
β ββ data_constrained_scaling_law.yaml
β ββ domain_mix_scaling_law.yaml
β ββ lr_and_bsz_scaling_law.yaml
β ββ moe_scaling_law.yaml
β ββ parallel_scaling_law.yaml
β ββ sft_scaling_law.yaml
β ββ vocab_size_scaling_law.yaml
ββ data_loader.py # βοΈ Unified data loading interface (from Hugging Face)
ββ evaluator.py # β
Unified evaluation system
ββ init_program.py # π± Initial scaling-law template
ββ results/ # π Outputs & checkpoints (created automatically)
ββ scripts/
ββ run.sh # π Batch execution helper
export EVAL_TASK_NAME="data_constrained_scaling_law"
uv run python openevolve-run.py \
--config configs/data_constrained_scaling_law.yaml \
init_program.py evaluator.py \
--output results/data_constrained_scaling_law/run_1
bash scripts/run.sh
This will:
- Run each task 3 times with different random seeds.
- Write outputs to
results/{task_name}/run_{1,2,3}/
. - Save intermediate checkpoints.
- Evaluate and save the best program from each run.
EVAL_TASK_NAME="data_constrained_scaling_law" \
uv run python evaluator.py \
results/data_constrained_scaling_law/run_1/best/best_program.py
Create configs/your_law_name.yaml
and customize the settings (see the full template in the original README). Key sections include llm
, prompt
, database
, and evaluator
.
Upload your data to Hugging Face Hub. The data should be structured with appropriate feature and target columns following the existing schema patterns.
Add your task schema to the TASK_SCHEMA_MAP
dictionary in data_loader.py
:
TASK_SCHEMA_MAP = {
# ... existing tasks ...
"your_law_name": {
"feature_names": ["feature1", "feature2"],
"target_name": "target_variable",
},
}
Add your task to the SUPPORTED_TASKS
set in evaluator.py
:
SUPPORTED_TASKS = {
# ... existing tasks ...
"your_law_name",
}
Add "your_law_name"
to the tasks
array in scripts/run.sh
to include it in batch runs.
Key knobs to tune in your .yaml
files:
- Search Budget: Increase
max_iterations
andpopulation_size
for more thorough exploration. - Exploration vs. Exploitation: Adjust
exploration_ratio
andexploitation_ratio
. - Parallelism: Raise
parallel_evaluations
to speed things up. - Reproducibility: Set a fixed
random_seed
for consistent results. - API Resilience: Bump
llm.timeout
andllm.retries
for flaky networks.
- Data is centrally hosted on Hugging Face Hub at pkuHaowei/sldbench
- The unified
data_loader.py
automatically loads data based on the task name and predefined schema
- Import Errors: Run
uv sync
to ensure your environment is up-to-date. - Task Not Found: Check that
EVAL_TASK_NAME
matches a task key inSUPPORTED_TASKS
inevaluator.py
. - Data Loading Issues: Verify internet connection and access to Hugging Face Hub repository
pkuHaowei/sldbench
. - API Timeouts: Increase
llm.timeout
andllm.retries
in your config, or check yourOPENAI_BASE_URL
. - Script Not Executable: Run
chmod +x scripts/run.sh
or execute it withbash scripts/run.sh
.
Do I have to use OpenAI?
No. Any OpenAI-compatible endpoint works. Just set the api_base
in your YAML config or the OPENAI_BASE_URL
environment variable.
Can I use pip
instead of uv
?
Yes. Create a virtual environment, activate it, and install dependencies from requirements.txt
. Then run the Python commands directly.
Where are the results stored?
Under results/{task_name}/{run_id}/
. You'll find checkpoints, logs, and the final best/best_program.py
.
If you use SLDAgent or SLDBench in your academic work, please cite the paper:
@article{lin2026SLD,
title = {Can Language Models Discover Scaling Laws?},
author = {Lin, Haowei et al.},
year = {2025}
}
This project is built on the excellent OpenEvolve.