Compare LLM architectures without downloading weights.
modelsig extracts a multi-layer structural fingerprint from any HuggingFace model and tells you whether two models are architecturally equivalent — so the smaller one can act as a valid proxy for testing the larger one.
Testing inference engines (vLLM, TensorRT-LLM, SGLang, llama.cpp, ONNX Runtime, etc.) against every large model is prohibitively expensive. modelsig answers:
"Can I test Qwen3-72B correctness using Qwen3-7B instead?" "Is Nemotron-120B-FP4 architecturally equivalent to the BF16 variant?" "Does this ONNX export match the original safetensors model?"
It compares structural fingerprints — shape ratios, operator sets, KV cache patterns, layer topology — without ever downloading a single weight tensor.
- Zero weight download — safetensors header via HTTP Range (~20 bytes), ONNX graph-only (no
.onnx_data), or config-only fast mode - 5-layer fingerprint — static weights, arch config, op types, KV cache pattern, layer-level I/O signatures
- 3-phase isomorphism comparison — key overlap, substructure, algebraic scaling
- Substitution verdicts —
FULL_SUBSTITUTE / PARTIAL_SUBSTITUTE / NO_SUBSTITUTE - 4-level multi-fidelity test plan — maps models to test coverage levels L1–L4
- Wide model support — dense decoder, GQA, MoE, vision-language, speech, ONNX classification
- Both HF and local models — supports
local:/path/to/model - JSON / table / markdown output — CI-friendly JSON, human-readable table, shareable markdown
uv add modelsig # add to a uv project
# or
uv tool install modelsig # install as a standalone CLI toolgit clone https://github.com/joe0731/modelsig
cd modelsig
uv sync # install all deps + editable package
uv run modelsig --helppip install modelsigDependencies (all installed by default):
| Package | Purpose |
|---|---|
requests |
HTTP Range fetching for safetensors headers |
huggingface_hub |
Model file listing, downloads, auth |
onnx |
ONNX graph parsing (falls back to built-in protobuf if unavailable) |
transformers |
AutoConfig normalization, layer signature capture |
torch |
Meta-device forward pass for layer I/O shape collection |
safetensors |
Local safetensors file parsing |
# Analyze a single model (-m / --model flag)
modelsig -m Qwen/Qwen3-7B --output table
# Compare two models (proxy-test decision)
modelsig -m Qwen/Qwen3-7B -m Qwen/Qwen3-72B --compare --output table
# Fast mode for large models (config only, no download)
modelsig -m nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 --fast --output table
# ONNX model
modelsig -m onnx-community/Qwen3.5-0.8B-ONNX --output json
# Skip layer-level I/O signature capture (faster, no torch needed)
modelsig -m Qwen/Qwen3-7B --no-layer-sig --output json
# Private/gated model
modelsig -m org/private-model --token hf_xxx
# or: export HF_TOKEN=hf_xxxFor safetensors models, only the file header is fetched via HTTP Range requests (~20 bytes per shard). No weights are transferred.
For ONNX models, only the .onnx graph file is downloaded (typically 1–5 MB). The paired .onnx_data weight file (which can be GBs) is never touched.
For fast mode (--fast), only config.json is fetched (a few KB). No tensors at all.
| Layer | What it captures | Source |
|---|---|---|
| L1 Static weight signature | Per-tensor {abstract_key → shape, dtype, layer_type} — layer indices normalized to .N. |
safetensors header / ONNX initializers |
| L2 Architecture fingerprint | hidden_size, num_hidden_layers, num_attention_heads, num_key_value_heads, intermediate_size, head_dim, MoE config |
config.json via AutoConfig |
| L3 Op type set | Canonical operator vocabulary: aten/mm, attention, rms_norm, rope, silu, topk/router … |
tensor key patterns / ONNX opset |
| L4 KV cache shape pattern | [batch, num_kv_heads, seq_len, head_dim] |
derived from L2 |
| L5 Layer I/O signatures | Per-module {input: [{dtype, shape}], output: [{dtype, shape}]} on meta device |
torch forward hooks (--no-layer-sig to skip) |
Phase 1 — Key coverage : normalized key set overlap ≥ 80%
Phase 2 — Substructure : attention / FFN / norm submodules match
Phase 3 — Algebraic scale : hidden_size / intermediate_size / head_dim ratios uniform within 20%
Result: ISOMORPHIC / SCALE_ONLY / DIFFERENT_ARCH
| Verdict | Meaning |
|---|---|
FULL_SUBSTITUTE |
All 3 phases pass + shape ratios uniform + layer_type_coverage ≥ 95% |
PARTIAL_SUBSTITUTE |
Phase 1+2 pass or op coverage ≥ 80% |
NO_SUBSTITUTE |
Different arch, MoE vs Dense mismatch, or key divergence |
When comparing two models, modelsig also computes a structural quantization transferability score:
struct_sim_score — 1.0 (ISOMORPHIC) / 0.80 (SCALE_ONLY) / 0.20 (DIFFERENT_ARCH)
op_hist_sim — cosine similarity of operator frequency vectors
layer_type_hist — Jaccard similarity of layer type sets
shape_uniform — whether common weight shapes scale uniformly
moe_correction — ~5% penalty for mixed MoE/Dense pairs
arch_risk_factors — hidden_size ratio, GQA mismatch, FFN expansion, RoPE theta diff
Output: estimated_transferability score (0–1) with confidence (HIGH/MEDIUM/LOW),
recommended_methods (GPTQ/AWQ/mixed-precision/expert-aware), and caveats.
This is a structural pre-filter only. SensCorr and RepAlign require actual calibration data and are the strongest transfer predictors. Use this score to decide whether to attempt transfer at all, not as a final guarantee.
L1 Structure — cheapest: model loading, tensor shapes, dtype validation
L2 Numerical — cosine similarity, perplexity on calibration set
L3 Runtime — prefill latency, decode throughput, KV cache eviction
L4 Canary — large/MoE model: peak memory, TP/PP correctness
modelsig -m Qwen/Qwen3-7B --output table==============================================================================
modelsig v2.0 | 2026-03-17T10:00:00Z
==============================================================================
Model: Qwen/Qwen3-7B
type qwen3
hidden_size 3584
num_hidden_layers 28
num_attention_heads 28 (kv: 8)
intermediate_size 18944
head_dim 128
is_moe False
ffn_expansion 5.285714
gqa_ratio 3.5
kv_cache_pattern [batch, 8, seq_len, 128]
op_types aten/mm, attention, embedding, rms_norm, rope, silu, swiglu
layer_types AttentionLayer, EmbeddingLayer, FFN_SwiGLU, LMHead, RMSNorm
abstract_keys 14
source safetensors
modelsig -m Qwen/Qwen3-7B -m Qwen/Qwen3-72B --compare --output tablemodelsig \
-m Qwen/Qwen3-7B -m Qwen/Qwen3-30B-A3B -m Qwen/Qwen3-235B-A22B \
--compare --multi-fidelity --output markdown --save report.mdmodelsig -m onnx-community/Qwen3-4B-ONNX --output jsonmodelsig -m Qwen/Qwen3-235B-A22B --fast --output tablemodelsig local:/path/to/model --output json
modelsig local:/path/to/7b local:/path/to/72b --comparemodelsig -m org/private-model --token hf_xxx
# or: export HF_TOKEN=hf_xxxmodelsig -m Qwen/Qwen3-7B -m Qwen/Qwen3-72B \
--compare --output markdown --save report.md# Only use --trust-remote-code for models you trust.
# This allows execution of arbitrary Python code from the model repository.
modelsig -m org/custom-model --trust-remote-codeProblem: You want to validate a new vLLM kernel for Qwen3-72B but CI is limited to A10G GPUs (24 GB VRAM).
modelsig -m Qwen/Qwen3-7B -m Qwen/Qwen3-72B --compare --output tableExpected result: ISOMORPHIC / FULL_SUBSTITUTE — same GQA pattern, same op set, uniform scaling. You can run full functional tests on 7B and gate the 72B behind a nightly canary run.
Problem: Does Qwen3-30B-A3B (MoE) behave like a drop-in proxy for Qwen3-235B-A22B?
modelsig -m Qwen/Qwen3-30B-A3B -m Qwen/Qwen3-235B-A22B \
--compare --multi-fidelity --output markdownBoth are MoE models from the same family → ISOMORPHIC. The multi-fidelity plan shows:
- L1: use 30B-A3B for structure/conversion tests
- L2: numerical validation on 30B
- L4: 235B-A22B as canary for routing correctness and peak memory
Problem: Can Llama-3.1-8B proxy-test a Mistral-7B?
modelsig -m meta-llama/Llama-3.1-8B-Instruct -m mistralai/Mistral-7B-v0.1 \
--compare --output jsonBoth are dense GQA decoders with the same op set → ISOMORPHIC / FULL_SUBSTITUTE. Despite different model_type labels, the structural fingerprint matches.
Problem: You converted GPT-2 to ONNX and want to verify the ONNX version matches the torch version structurally.
modelsig -m openai-community/gpt2 -m onnx-community/gpt2 --compare --output tableThe ONNX version is parsed from the .onnx graph file. The safetensors version is parsed from the header. Both share the same abstract key set → ISOMORPHIC.
Problem: Will nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 (quantized to FP4) behave the same as the BF16 variant?
modelsig \
-m nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 \
-m nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 \
--compare --fast --output tableBoth share the same architecture (120B MoE). --fast uses config-only mode to avoid downloading large safetensors headers. Result: ISOMORPHIC — same layer topology, only dtype differs.
Problem: You quantized Qwen3-7B with AWQ. Can that config transfer to Qwen3-72B?
modelsig \
-m Qwen/Qwen3-7B -m Qwen/Qwen3-72B \
--compare --output json --save qwen3_quant_transfer.jsonThe quant_transfer block in coverage_matrix gives:
estimated_transferability— composite score (0–1) based on structural similarityconfidence— HIGH/MEDIUM/LOWrecommended_methods— e.g.GPTQ (W4A16),AWQ (W4A16),Mixed-precisionarch_risk_factors— e.g. large hidden_size ratio, RoPE theta mismatchcaveats— whether activation-aware recalibration is needed
modelsig [-m MODEL_ID ...] [MODEL_ID ...] [OPTIONS]
Arguments:
-m / --model MODEL_ID HF model ID or local:PATH (repeatable, preferred)
MODEL_ID positional alternative — same as -m
Options:
--output json | table | markdown (default: json)
--compare Compute pairwise coverage for all model pairs
--save FILE Save output to file
--fast Config-only mode — no safetensors/ONNX download
--multi-fidelity Include 4-level multi-fidelity test plan
--no-layer-sig Skip per-module I/O dtype+shape capture (faster)
--token TOKEN HF Hub token for private/gated models
--timeout SEC HTTP timeout (default: 30)
--no-color Disable ANSI colors in table output
--trust-remote-code Allow trust_remote_code=True for custom model code
⚠ enables arbitrary code execution — use only for trusted models
modelsig/
├── analyze.py CLI entry point (~190 lines)
├── constants.py Shared constants: TOOL_NAME, _OP_RULES, _LAYER_TYPE_RULES, …
│
├── hf/
│ └── client.py HF Hub client: token management, HTTP GET + backoff,
│ model_info().siblings, hf_hub_download
│
├── parsers/
│ ├── safetensors.py HTTP Range header fetch + local shard discovery
│ └── config.py AutoConfig.from_pretrained() + _flatten_config() aliases
│
├── onnx/
│ ├── ops.py _ONNX_DTYPE map, _ONNX_OP_MAP, canonical op mapping
│ ├── parser.py onnx.load(load_external_data=False) + protobuf fallback
│ ├── selector.py Primary .onnx file selection heuristics
│ └── collector.py Orchestrates HF download → parse pipeline
│
├── torch/
│ └── layer_sig.py L5: per-module input/output dtype+shape via forward hooks
│
├── signature/
│ ├── static.py L1: build_static_weight_signature, norm_key, norm_dtype
│ ├── arch.py L2: build_arch_fingerprint, KV cache pattern, dim ratios
│ ├── template.py Per-layer canonical submodule template (for phase-2)
│ └── fingerprint.py ModelFingerprint dataclass + build_fingerprint orchestrator
│
├── comparison/
│ ├── phases.py Phase 1/2/3 isomorphism tests
│ ├── ratios.py Shape ratio uniformity analysis
│ ├── quant_transfer.py Structural quantization transferability estimator
│ ├── coverage.py Unified compute_coverage + test strategy + quant_transfer
│ └── multifidelity.py 4-level multi-fidelity test plan builder
│
└── output/
├── colors.py ANSI color helpers
├── json_fmt.py JSON formatter + fp_to_dict
├── table_fmt.py ANSI table formatter
└── markdown_fmt.py Markdown report formatter
- No arbitrary code execution by default.
trust_remote_codeisFalseunless explicitly set via--trust-remote-code. - Token safety. The HF token is passed via HTTP headers only — never embedded in URLs or logged to stderr.
- No weight download. Only metadata (safetensors header, ONNX graph, config.json) is fetched.
| Principle | Implementation |
|---|---|
| Zero weight download | HTTP Range (safetensors), graph-only .onnx, config-only fast path |
| Framework-driven parsing | AutoConfig.from_pretrained() for config normalization; onnx.load() for graph parsing |
| Graceful degradation | Every heavy dependency is optional — falls back to built-in parsers |
| Architecture-agnostic | Works on dense decoders, GQA models, MoE, vision-language, speech, classification |
| Single CLI, composable API | Import any module independently or use the unified CLI |
| Safe by default | trust_remote_code=False; token in headers not URLs |
Validated weekly against 57 models (29 safetensors + 28 ONNX):
Safetensors (full header fetch): Qwen3.5-{0.8B,4B,9B,27B,35B-A3B,397B-A17B}, Qwen2.5-7B-Instruct, Qwen3-Coder-Next, DeepSeek-V3.2, Kimi-K2.5, MiniMax-M2.5, GLM-5, Nemotron-3-{Nano-4B, Super-120B}-{BF16,NVFP4,FP8}, Granite-4.0-1b-speech, BitNet-b1.58-2B-4T, MiroThinker-{1.7,1.7-mini}, Sarvam-{30b,105b}, Reka-edge-2603, LocoTrainer-4B, OmniCoder-9B, Nanbeige4.1-3B, Param2-17B-A2.4B, gpt-oss-20b, all-MiniLM-L6-v2
ONNX (graph-only, no weight download): Qwen3.5-{0.8B,2B,4B}-ONNX, Qwen3-{4B-VL,VL-2B,Reranker-0.6B}-ONNX, Qwen2.5-{0.5B,VL-3B}-ONNX, LFM2-24B-A2B, Olmo-Hybrid-{SFT,DPO,Think}-7B, Voxtral-Mini-4B, Granite-4.0-1b-speech, Nemotron-Nano-4B, BERT-multilingual-NER, chinese-RoBERTa, multilingual-MiniLMv2, CodeT5, Jan-code-4b, Josiefied-Qwen3.5-0.8B, IndoBERT-news-classification, ai-image-detection × 4, vehicle-classification, tmr-text-detector
All logic is in the modelsig/ package. Each subdirectory has a single responsibility. Tests live in tests/ and cover 130+ unit + integration scenarios.
git clone https://github.com/joe0731/modelsig
cd modelsig
uv sync --extra dev # installs all deps + dev tools
uv run pytest tests/ -vWeekly validation against the full model zoo runs via GitHub Actions (.github/workflows/weekly-validation.yml).
- huggingface_hub — HF Hub Python client
- safetensors — safe, zero-copy tensor serialization
- vLLM — high-throughput LLM inference
- ONNX Runtime — cross-platform inference accelerator
Apache 2.0 — see LICENSE.