Skip to content

Conversation

psychedelicious
Copy link
Collaborator

@psychedelicious psychedelicious commented Oct 1, 2025

Summary

The model manager's probing/identification code has been in a state of partial migration for many months now. Most models used the old API, which was difficult to extend, and the mixing of old and new APIs made it hard to understand what was happening.

This PR refines the new API and ports all supported models to use it.

Design goals

  • Make it easy to understand what happens when we identify a model. There should be as little indirection as possible.
  • Make it easy to add support for a new model, be it a totally new architecture or auxiliary model for an existing architecture.
  • Use consistent, clear patterns throughout. Document when it is not immediately obvious what is happening.
  • Model configs should only contain information relevant to themselves. No optional attrs that are only used for a subset of models, no "god classes" that contain bazillions of fields for every possible case.
  • Model configs should not hard-code information that can be trivially derived at runtime.
  • Reduce touch points required to add support for new models by colocating model identification and loading logic.

Specifics

Single method to identify models

The new API required model config classes to implement two methods:

  • matches() -> does the candidate model look like this kind of model?
  • parse() -> build the config fields from the candidate

The intention was for the matches() to be a quick, lightweight check, and then parse() to do any heavy lifting (like loading the model to inspect it). We find the first match, then parse it.

In practice, however, we need to do almost all of the heavy lifting in matches() anyways. parse() ended up doing very little, and sometimes even duplicated work done in matches().

This API is needlessly complex and doesn't add any practical benefits.

I've consolidated these responsibilities into a from_model_on_disk() method which each config must implement. The method gets a reference to the candidate model files and attempts to construct a config from it. If it fails, a NotAMatch exception is raised, which includes a reason.

Here's an example from one of the IP Adapter config classes:

@classmethod
def from_model_on_disk(cls, mod: ModelOnDisk, fields: dict[str, Any]) -> Self:
    _validate_is_dir(cls, mod)

    _validate_override_fields(cls, fields)

    cls._validate_has_weights_file(mod)

    cls._validate_has_image_encoder_metadata_file(mod)

    cls._validate_base(mod)
    
    return cls(**fields)

Each of the validate functions either returns or raises.

Narrow config classes

Previously, we have some "god" classes like MainModelConfig. This class accommodated every possible main/pipeline model and was worse off for it. Config classes have been split up to be rather narrow, and the class names reflect this.

Each config must represent a single combination of model type, base and format. As before, the base of "any" serves as a fallback for models without an associated pipeline architecture, like text encoders.

Here are all main/pipeline classes:

# Main (Pipeline) - diffusers format
- Main_Diffusers_SD1_Config
- Main_Diffusers_SD2_Config
- Main_Diffusers_SDXL_Config
- Main_Diffusers_SDXLRefiner_Config
- Main_Diffusers_SD3_Config
- Main_Diffusers_CogView4_Config
# Main (Pipeline) - checkpoint format
- Main_Checkpoint_SD1_Config
- Main_Checkpoint_SD2_Config
- Main_Checkpoint_SDXL_Config
- Main_Checkpoint_SDXLRefiner_Config
- Main_Checkpoint_FLUX_Config
# Main (Pipeline) - quantized formats
- Main_BnBNF4_FLUX_Config
- Main_GGUF_FLUX_Config

All classes are named with the format <type>_<format>_<base>. It's immediately obvious which kind of model you are working with.

Because the classes are narrow, identification logic can also be more focused. Adding a new model architecture doesn't require you to figure out where to slot in the checks amongst checks for all the existing architectures - you make a new class and implement logic to positively identify this specific kind of model.

Organize the identification codebase (planned)

Currently we have two big honking files that have all the model identification code in a big jumble. Two main reasons for this:

  • We had "god classes" that were responsible for many permutations of models.
  • The structure of classes would result in circular import issues if we were to split the files up.

With the revised structure for config classes, we can more easily split the code by, say, model type. I haven't gotten to this yet, but the goal is to organize things to make it hard to get lost, even for a new contributor.

Configs are the loaders (planned)

As I worked through this, I found the split between model loaders and identifiers to feel overly complicated. There's a separate model loader registry that lets a model loader declare which specific combinations of type/format/base it can load.

We can only load known models, and all known models are represented by a config class. The loaders essentially map directly to config classes, but instead of just being on the config classes, we have to manually associate them. Then, every loader has defensive code where it checks if the config it is trying to load the is the right class before loading. The loaders also duplicate some amount of the model identification code (like finding the weights file).

I plan to refactor this so that the config classes are also responsible for loading models. With the other changes in this PR and #8577, we will have reduced the touch points required to add support for new models substantially:

Old
  • Complicated new model identification API tangled with even more complicated old API
  • Loader and registration
  • Inference code
  • Nodes
  • Crapload of frontend boilerplate
New
  • Colocated and simplified model identification and loading (i.e. everything that deals with model files)
  • Inference code
  • Nodes
  • Minimal (if any) frontend boilerplate

Remove model hashing (planned)

While I still believe model hashing was a good idea in isolation, in practice, it is pure downside.

Models take longer to install, sometimes taking very long on certain filesystems or storage setups. Because we let users opt out of hashing and change the hashing algorithm, the hashes end up not serving the intended purpose - a cross-platform, stable model identifier.

Even if we required hashing and used the same algo everywhere, we don't actually use the stored hashes anywhere.

I plan to remove model hashing from the model manager as part of this PR.

Related Issues / Discussions

Offline discussions

QA Instructions

TBD

Merge Plan

TBD

Checklist

  • The PR has a short but descriptive title, suitable for a changelog
  • Tests added / updated (if applicable)
  • ❗Changes to a redux slice have a corresponding migration
  • Documentation added / updated (if applicable)
  • Updated What's New copy (if doing a release after this PR)

- Add concept of match certainty to new probe
- Port CLIP Embed models to new API
- Fiddle with stuff
Previously, we had a multi-phase strategy to identify models from their
files on disk:
1. Run each model config classes' `matches()` method on the files. It
checks if the model could possibly be an identified as the candidate
model type. This was intended to be a quick check. Break on the first
match.
2. If we have a match, run the config class's `parse()` method. It
derive some additional model config attrs from the model files. This was
intended to encapsulate heavier operations that may require loading the
model into memory.
3. Derive the common model config attrs, like name, description,
calculate the hash, etc. Some of these are also heavier operations.

This strategy has some issues:
- It is not clear how the pieces fit together. There is some
back-and-forth between different methods and the config base class. It
is hard to trace the flow of logic until you fully wrap your head around
the system and therefore difficult to add a model architecture to the
probe.
- The assumption that we could do quick, lightweight checks before
heavier checks is incorrect. We often _must_ load the model state dict
in the `matches()` method. So there is no practical perf benefit to
splitting up the responsibility of `matches()` and `parse()`.
- Sometimes we need to do the same checks in `matches()` and `parse()`.
In these cases, splitting the logic is has a negative perf impact
because we are doing the same work twice.
- As we introduce the concept of an "unknown" model config (i.e. a model
that we cannot identify, but still record in the db; see #8582), we will
_always_ run _all_ the checks for every model. Therefore we need not try
to defer heavier checks or resource-intensive ops like hashing. We are
going to do them anyways.
- There are situations where a model may match multiple configs. One
known case are SD pipeline models with merged LoRAs. In the old probe
API, we relied on the implicit order of checks to know that if a model
matched for pipeline _and_ LoRA, we prefer the pipeline match. But, in
the new API, we do not have this implicit ordering of checks. To resolve
this in a resilient way, we need to get all matches up front, then use
tie-breaker logic to figure out which should win (or add "differential
diagnosis" logic to the matchers).
- Field overrides weren't handled well by this strategy. They were only
applied at the very end, if a model matched successfully. This means we
cannot tell the system "Hey, this model is type X with base Y. Trust me
bro.". We cannot override the match logic. As we move towards letting
users correct mis-identified models (see #8582), this is a requirement.

We can simplify the process significantly and better support "unknown"
models.

Firstly, model config classes now have a single `from_model_on_disk()`
method that attempts to construct an instance of the class from the
model files. This replaces the `matches()` and `parse()` methods.

If we fail to create the config instance, a special exception is raised
that indicates why we think the files cannot be identified as the given
model config class.

Next, the flow for model identification is a bit simpler:
- Derive all the common fields up-front (name, desc, hash, etc).
- Merge in overrides.
- Call `from_model_on_disk()` for every config class, passing in the
fields. Overrides are handled in this method.
- Record the results for each config class and choose the best one.

The identification logic is a bit more verbose, with the special
exceptions and handling of overrides, but it is very clear what is
happening.

The one downside I can think of for this strategy is we do need to check
every model type, instead of stopping at the first match. It's a bit
less efficient. In practice, however, this isn't a hot code path, and
the improved clarity is worth far more than perf optimizations that the
end user will likely never notice.
@github-actions github-actions bot added api python PRs that change python files invocations PRs that change invocations backend PRs that change backend files services PRs that change app services frontend PRs that change frontend files python-tests PRs that change python tests labels Oct 1, 2025
@psychedelicious
Copy link
Collaborator Author

As of this writing:

  • All legacy probe code is still in the PR for reference as I work on the refactor.
  • All starter models probe correctly from disk (i.e. without any field overrides/hints), except these:
    • /home/bat/models/any/clip_vision/clip-vit-large-patch14
    • /home/bat/models/sdxl/ip_adapter/Standard Reference (IP Adapter ViT-H).safetensors
    • /home/bat/models/any/t5_encoder/t5_base_encoder/
    • /home/bat/models/any/siglip/SigLIP - google/siglip-so400m-patch14-384
    • /home/bat/models/any/t5_encoder/t5_bnb_int8_quantized_encoder/

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api backend PRs that change backend files frontend PRs that change frontend files invocations PRs that change invocations python PRs that change python files python-tests PRs that change python tests services PRs that change app services
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant