modelsig

Compare LLM architectures without downloading weights.

modelsig extracts a multi-layer structural fingerprint from any HuggingFace model and tells you whether two models are architecturally equivalent — so the smaller one can act as a valid proxy for testing the larger one.

What problem does it solve?

Testing inference engines (vLLM, TensorRT-LLM, SGLang, llama.cpp, ONNX Runtime, etc.) against every large model is prohibitively expensive. modelsig answers:

"Can I test Qwen3-72B correctness using Qwen3-7B instead?" "Is Nemotron-120B-FP4 architecturally equivalent to the BF16 variant?" "Does this ONNX export match the original safetensors model?"

It compares structural fingerprints — shape ratios, operator sets, KV cache patterns, layer topology — without ever downloading a single weight tensor.

Key Features

Zero weight download — safetensors header via HTTP Range (~20 bytes), ONNX graph-only (no .onnx_data), or config-only fast mode
5-layer fingerprint — static weights, arch config, op types, KV cache pattern, layer-level I/O signatures
3-phase isomorphism comparison — key overlap, substructure, algebraic scaling
Substitution verdicts — FULL_SUBSTITUTE / PARTIAL_SUBSTITUTE / NO_SUBSTITUTE
4-level multi-fidelity test plan — maps models to test coverage levels L1–L4
Wide model support — dense decoder, GQA, MoE, vision-language, speech, ONNX classification
Both HF and local models — supports local:/path/to/model
JSON / table / markdown output — CI-friendly JSON, human-readable table, shareable markdown

Installation

From PyPI (recommended)

uv add modelsig           # add to a uv project
# or
uv tool install modelsig  # install as a standalone CLI tool

From source

git clone https://github.com/joe0731/modelsig
cd modelsig
uv sync                   # install all deps + editable package
uv run modelsig --help

Still using pip?

pip install modelsig

Dependencies (all installed by default):

Package	Purpose
`requests`	HTTP Range fetching for safetensors headers
`huggingface_hub`	Model file listing, downloads, auth
`onnx`	ONNX graph parsing (falls back to built-in protobuf if unavailable)
`transformers`	AutoConfig normalization, layer signature capture
`torch`	Meta-device forward pass for layer I/O shape collection
`safetensors`	Local safetensors file parsing

Quick Start

# Analyze a single model (-m / --model flag)
modelsig -m Qwen/Qwen3-7B --output table

# Compare two models (proxy-test decision)
modelsig -m Qwen/Qwen3-7B -m Qwen/Qwen3-72B --compare --output table

# Fast mode for large models (config only, no download)
modelsig -m nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 --fast --output table

# ONNX model
modelsig -m onnx-community/Qwen3.5-0.8B-ONNX --output json

# Skip layer-level I/O signature capture (faster, no torch needed)
modelsig -m Qwen/Qwen3-7B --no-layer-sig --output json

# Private/gated model
modelsig -m org/private-model --token hf_xxx
# or: export HF_TOKEN=hf_xxx

How It Works

Zero-Weight-Download

For safetensors models, only the file header is fetched via HTTP Range requests (~20 bytes per shard). No weights are transferred.

For ONNX models, only the .onnx graph file is downloaded (typically 1–5 MB). The paired .onnx_data weight file (which can be GBs) is never touched.

For fast mode (--fast), only config.json is fetched (a few KB). No tensors at all.

5-Layer Signature System

Layer	What it captures	Source
L1 Static weight signature	Per-tensor `{abstract_key → shape, dtype, layer_type}` — layer indices normalized to `.N.`	safetensors header / ONNX initializers
L2 Architecture fingerprint	`hidden_size`, `num_hidden_layers`, `num_attention_heads`, `num_key_value_heads`, `intermediate_size`, `head_dim`, MoE config	`config.json` via AutoConfig
L3 Op type set	Canonical operator vocabulary: `aten/mm`, `attention`, `rms_norm`, `rope`, `silu`, `topk/router` …	tensor key patterns / ONNX opset
L4 KV cache shape pattern	`[batch, num_kv_heads, seq_len, head_dim]`	derived from L2
L5 Layer I/O signatures	Per-module `{input: [{dtype, shape}], output: [{dtype, shape}]}` on meta device	torch forward hooks (`--no-layer-sig` to skip)

3-Phase Isomorphism Comparison

Phase 1 — Key coverage    : normalized key set overlap ≥ 80%
Phase 2 — Substructure    : attention / FFN / norm submodules match
Phase 3 — Algebraic scale : hidden_size / intermediate_size / head_dim ratios uniform within 20%

Result: ISOMORPHIC / SCALE_ONLY / DIFFERENT_ARCH

Substitution Verdict

Verdict	Meaning
`FULL_SUBSTITUTE`	All 3 phases pass + shape ratios uniform + layer_type_coverage ≥ 95%
`PARTIAL_SUBSTITUTE`	Phase 1+2 pass or op coverage ≥ 80%
`NO_SUBSTITUTE`	Different arch, MoE vs Dense mismatch, or key divergence

Quantization Transferability Estimate

When comparing two models, modelsig also computes a structural quantization transferability score:

struct_sim_score   — 1.0 (ISOMORPHIC) / 0.80 (SCALE_ONLY) / 0.20 (DIFFERENT_ARCH)
op_hist_sim        — cosine similarity of operator frequency vectors
layer_type_hist    — Jaccard similarity of layer type sets
shape_uniform      — whether common weight shapes scale uniformly
moe_correction     — ~5% penalty for mixed MoE/Dense pairs
arch_risk_factors  — hidden_size ratio, GQA mismatch, FFN expansion, RoPE theta diff

Output: estimated_transferability score (0–1) with confidence (HIGH/MEDIUM/LOW), recommended_methods (GPTQ/AWQ/mixed-precision/expert-aware), and caveats.

This is a structural pre-filter only. SensCorr and RepAlign require actual calibration data and are the strongest transfer predictors. Use this score to decide whether to attempt transfer at all, not as a final guarantee.

Multi-Fidelity Test Plan (4 levels)

L1 Structure    — cheapest: model loading, tensor shapes, dtype validation
L2 Numerical    — cosine similarity, perplexity on calibration set
L3 Runtime      — prefill latency, decode throughput, KV cache eviction
L4 Canary       — large/MoE model: peak memory, TP/PP correctness

Usage

Basic — analyze a single model

modelsig -m Qwen/Qwen3-7B --output table

==============================================================================
  modelsig v2.0  |  2026-03-17T10:00:00Z
==============================================================================

   Model: Qwen/Qwen3-7B
  type                   qwen3
  hidden_size            3584
  num_hidden_layers      28
  num_attention_heads    28  (kv: 8)
  intermediate_size      18944
  head_dim               128
  is_moe                 False
  ffn_expansion          5.285714
  gqa_ratio              3.5
  kv_cache_pattern       [batch, 8, seq_len, 128]
  op_types               aten/mm, attention, embedding, rms_norm, rope, silu, swiglu
  layer_types            AttentionLayer, EmbeddingLayer, FFN_SwiGLU, LMHead, RMSNorm
  abstract_keys          14
  source                 safetensors

Compare models (proxy-testing decision)

modelsig -m Qwen/Qwen3-7B -m Qwen/Qwen3-72B --compare --output table

Full analysis with multi-fidelity plan

modelsig \
    -m Qwen/Qwen3-7B -m Qwen/Qwen3-30B-A3B -m Qwen/Qwen3-235B-A22B \
    --compare --multi-fidelity --output markdown --save report.md

ONNX model

modelsig -m onnx-community/Qwen3-4B-ONNX --output json

Config-only fast mode (no safetensors/ONNX fetch, instantaneous)

modelsig -m Qwen/Qwen3-235B-A22B --fast --output table

Local model directory

modelsig local:/path/to/model --output json
modelsig local:/path/to/7b local:/path/to/72b --compare

Private / gated models

modelsig -m org/private-model --token hf_xxx
# or: export HF_TOKEN=hf_xxx

Save report

modelsig -m Qwen/Qwen3-7B -m Qwen/Qwen3-72B \
    --compare --output markdown --save report.md

Models with custom code

# Only use --trust-remote-code for models you trust.
# This allows execution of arbitrary Python code from the model repository.
modelsig -m org/custom-model --trust-remote-code

Scenario Examples

Scenario 1 — Inference Engine Regression Testing

Problem: You want to validate a new vLLM kernel for Qwen3-72B but CI is limited to A10G GPUs (24 GB VRAM).

modelsig -m Qwen/Qwen3-7B -m Qwen/Qwen3-72B --compare --output table

Expected result: ISOMORPHIC / FULL_SUBSTITUTE — same GQA pattern, same op set, uniform scaling. You can run full functional tests on 7B and gate the 72B behind a nightly canary run.

Scenario 2 — MoE vs Dense Compatibility Check

Problem: Does Qwen3-30B-A3B (MoE) behave like a drop-in proxy for Qwen3-235B-A22B?

modelsig -m Qwen/Qwen3-30B-A3B -m Qwen/Qwen3-235B-A22B \
    --compare --multi-fidelity --output markdown

Both are MoE models from the same family → ISOMORPHIC. The multi-fidelity plan shows:

L1: use 30B-A3B for structure/conversion tests
L2: numerical validation on 30B
L4: 235B-A22B as canary for routing correctness and peak memory

Scenario 3 — Cross-Family Sanity Check

Problem: Can Llama-3.1-8B proxy-test a Mistral-7B?

modelsig -m meta-llama/Llama-3.1-8B-Instruct -m mistralai/Mistral-7B-v0.1 \
    --compare --output json

Both are dense GQA decoders with the same op set → ISOMORPHIC / FULL_SUBSTITUTE. Despite different model_type labels, the structural fingerprint matches.

Scenario 4 — ONNX Runtime Compatibility

Problem: You converted GPT-2 to ONNX and want to verify the ONNX version matches the torch version structurally.

modelsig -m openai-community/gpt2 -m onnx-community/gpt2 --compare --output table

The ONNX version is parsed from the .onnx graph file. The safetensors version is parsed from the header. Both share the same abstract key set → ISOMORPHIC.

Scenario 5 — Quantized Model Compatibility

Problem: Will nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 (quantized to FP4) behave the same as the BF16 variant?

modelsig \
    -m nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16 \
    -m nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-NVFP4 \
    --compare --fast --output table

Both share the same architecture (120B MoE). --fast uses config-only mode to avoid downloading large safetensors headers. Result: ISOMORPHIC — same layer topology, only dtype differs.

Scenario 6 — Quantization Method Transfer

Problem: You quantized Qwen3-7B with AWQ. Can that config transfer to Qwen3-72B?

modelsig \
    -m Qwen/Qwen3-7B -m Qwen/Qwen3-72B \
    --compare --output json --save qwen3_quant_transfer.json

The quant_transfer block in coverage_matrix gives:

estimated_transferability — composite score (0–1) based on structural similarity
confidence — HIGH/MEDIUM/LOW
recommended_methods — e.g. GPTQ (W4A16), AWQ (W4A16), Mixed-precision
arch_risk_factors — e.g. large hidden_size ratio, RoPE theta mismatch
caveats — whether activation-aware recalibration is needed

CLI Reference

modelsig [-m MODEL_ID ...] [MODEL_ID ...] [OPTIONS]

Arguments:
  -m / --model MODEL_ID  HF model ID or local:PATH (repeatable, preferred)
  MODEL_ID               positional alternative — same as -m

Options:
  --output              json | table | markdown  (default: json)
  --compare             Compute pairwise coverage for all model pairs
  --save FILE           Save output to file
  --fast                Config-only mode — no safetensors/ONNX download
  --multi-fidelity      Include 4-level multi-fidelity test plan
  --no-layer-sig        Skip per-module I/O dtype+shape capture (faster)
  --token TOKEN         HF Hub token for private/gated models
  --timeout SEC         HTTP timeout (default: 30)
  --no-color            Disable ANSI colors in table output
  --trust-remote-code   Allow trust_remote_code=True for custom model code
                        ⚠ enables arbitrary code execution — use only for trusted models

Module Structure

modelsig/
├── analyze.py              CLI entry point (~190 lines)
├── constants.py            Shared constants: TOOL_NAME, _OP_RULES, _LAYER_TYPE_RULES, …
│
├── hf/
│   └── client.py           HF Hub client: token management, HTTP GET + backoff,
│                           model_info().siblings, hf_hub_download
│
├── parsers/
│   ├── safetensors.py      HTTP Range header fetch + local shard discovery
│   └── config.py           AutoConfig.from_pretrained() + _flatten_config() aliases
│
├── onnx/
│   ├── ops.py              _ONNX_DTYPE map, _ONNX_OP_MAP, canonical op mapping
│   ├── parser.py           onnx.load(load_external_data=False) + protobuf fallback
│   ├── selector.py         Primary .onnx file selection heuristics
│   └── collector.py        Orchestrates HF download → parse pipeline
│
├── torch/
│   └── layer_sig.py        L5: per-module input/output dtype+shape via forward hooks
│
├── signature/
│   ├── static.py           L1: build_static_weight_signature, norm_key, norm_dtype
│   ├── arch.py             L2: build_arch_fingerprint, KV cache pattern, dim ratios
│   ├── template.py         Per-layer canonical submodule template (for phase-2)
│   └── fingerprint.py      ModelFingerprint dataclass + build_fingerprint orchestrator
│
├── comparison/
│   ├── phases.py           Phase 1/2/3 isomorphism tests
│   ├── ratios.py           Shape ratio uniformity analysis
│   ├── quant_transfer.py   Structural quantization transferability estimator
│   ├── coverage.py         Unified compute_coverage + test strategy + quant_transfer
│   └── multifidelity.py    4-level multi-fidelity test plan builder
│
└── output/
    ├── colors.py           ANSI color helpers
    ├── json_fmt.py         JSON formatter + fp_to_dict
    ├── table_fmt.py        ANSI table formatter
    └── markdown_fmt.py     Markdown report formatter

Security

No arbitrary code execution by default. trust_remote_code is False unless explicitly set via --trust-remote-code.
Token safety. The HF token is passed via HTTP headers only — never embedded in URLs or logged to stderr.
No weight download. Only metadata (safetensors header, ONNX graph, config.json) is fetched.

Design Principles

Principle	Implementation
Zero weight download	HTTP Range (safetensors), graph-only .onnx, config-only fast path
Framework-driven parsing	`AutoConfig.from_pretrained()` for config normalization; `onnx.load()` for graph parsing
Graceful degradation	Every heavy dependency is optional — falls back to built-in parsers
Architecture-agnostic	Works on dense decoders, GQA models, MoE, vision-language, speech, classification
Single CLI, composable API	Import any module independently or use the unified CLI
Safe by default	`trust_remote_code=False`; token in headers not URLs

Supported Model Families

Validated weekly against 57 models (29 safetensors + 28 ONNX):

Safetensors (full header fetch): Qwen3.5-{0.8B,4B,9B,27B,35B-A3B,397B-A17B}, Qwen2.5-7B-Instruct, Qwen3-Coder-Next, DeepSeek-V3.2, Kimi-K2.5, MiniMax-M2.5, GLM-5, Nemotron-3-{Nano-4B, Super-120B}-{BF16,NVFP4,FP8}, Granite-4.0-1b-speech, BitNet-b1.58-2B-4T, MiroThinker-{1.7,1.7-mini}, Sarvam-{30b,105b}, Reka-edge-2603, LocoTrainer-4B, OmniCoder-9B, Nanbeige4.1-3B, Param2-17B-A2.4B, gpt-oss-20b, all-MiniLM-L6-v2

ONNX (graph-only, no weight download): Qwen3.5-{0.8B,2B,4B}-ONNX, Qwen3-{4B-VL,VL-2B,Reranker-0.6B}-ONNX, Qwen2.5-{0.5B,VL-3B}-ONNX, LFM2-24B-A2B, Olmo-Hybrid-{SFT,DPO,Think}-7B, Voxtral-Mini-4B, Granite-4.0-1b-speech, Nemotron-Nano-4B, BERT-multilingual-NER, chinese-RoBERTa, multilingual-MiniLMv2, CodeT5, Jan-code-4b, Josiefied-Qwen3.5-0.8B, IndoBERT-news-classification, ai-image-detection × 4, vehicle-classification, tmr-text-detector

Contributing

All logic is in the modelsig/ package. Each subdirectory has a single responsibility. Tests live in tests/ and cover 130+ unit + integration scenarios.

git clone https://github.com/joe0731/modelsig
cd modelsig
uv sync --extra dev       # installs all deps + dev tools
uv run pytest tests/ -v

Weekly validation against the full model zoo runs via GitHub Actions (.github/workflows/weekly-validation.yml).

Related Projects

huggingface_hub — HF Hub Python client
safetensors — safe, zero-copy tensor serialization
vLLM — high-throughput LLM inference
ONNX Runtime — cross-platform inference accelerator

License

Apache 2.0 — see LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.github/workflows		.github/workflows
modelsig		modelsig
results		results
tests		tests
.gitignore		.gitignore
README.md		README.md
WORKFLOW.md		WORKFLOW.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

modelsig

What problem does it solve?

Key Features

Installation

From PyPI (recommended)

From source

Still using pip?

Quick Start

How It Works

Zero-Weight-Download

5-Layer Signature System

3-Phase Isomorphism Comparison

Substitution Verdict

Quantization Transferability Estimate

Multi-Fidelity Test Plan (4 levels)

Usage

Basic — analyze a single model

Compare models (proxy-testing decision)

Full analysis with multi-fidelity plan

ONNX model

Config-only fast mode (no safetensors/ONNX fetch, instantaneous)

Local model directory

Private / gated models

Save report

Models with custom code

Scenario Examples

Scenario 1 — Inference Engine Regression Testing

Scenario 2 — MoE vs Dense Compatibility Check

Scenario 3 — Cross-Family Sanity Check

Scenario 4 — ONNX Runtime Compatibility

Scenario 5 — Quantized Model Compatibility

Scenario 6 — Quantization Method Transfer

CLI Reference

Module Structure

Security

Design Principles

Supported Model Families

Contributing

Related Projects

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages