Skip to content

fix: add support for gemma 4#287

Merged
p-e-w merged 1 commit intop-e-w:masterfrom
MoonRide303:gemma-4
Apr 12, 2026
Merged

fix: add support for gemma 4#287
p-e-w merged 1 commit intop-e-w:masterfrom
MoonRide303:gemma-4

Conversation

@MoonRide303
Copy link
Copy Markdown
Contributor

fix proposal for #278

tested locally on gemma-4-E2B-it and Llama-3.2-3B-Instruct

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the LoRA adapter initialization to use full module names for target identification and improves the logging of target types. A high-severity issue was identified where the sorted list of target modules is immediately overwritten by an unsorted list, which should be corrected to ensure deterministic behavior.

@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented Apr 8, 2026

What is the difference between this PR and #285? Which approach do you think is better?

@MoonRide303
Copy link
Copy Markdown
Contributor Author

MoonRide303 commented Apr 8, 2026

@p-e-w About the same idea, I didn't notice this #285 exist (it wasn't linked in the issue). That said, I am not sure about that change in try_add was in case of #285 (it doesn't look necessary to me, #287 worked fine without it), and then adjusting "LoRA adapters initialized" output in case of #287 is good thing to have (otherwise you will get dump of a lot of layers), so I'd lean towards this variant.

@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented Apr 8, 2026

But how can this PR work if it doesn't extract the .linear submodule from the Gemma4ClippableLinear module?

@MoonRide303
Copy link
Copy Markdown
Contributor Author

MoonRide303 commented Apr 8, 2026

But selected modules are Linear, already. After adding print(f"{full_name} -> {type(module).__name__}") debug line just after full_name = module_id_to_full_name.get(id(module)) I get this:

> heretic --n-trials 2 --n-startup-trials 1 --row-normalization PRE --batch-size 128 --dtypes bfloat16 --model .\gemma-4-E2B-it
W0408 18:11:59.357000 32456 site-packages\torch\distributed\elastic\multiprocessing\redirects.py:29] NOTE: Redirects are currently not supported in Windows or MacOs.
█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀  v1.2.0
█▀█░█▀▀░█▀▄░█▀▀░░█░░█░█░░
▀░▀░▀▀▀░▀░▀░▀▀▀░░▀░░▀░▀▀▀  https://github.com/p-e-w/heretic

Detected 1 CUDA device(s) (15.99 GB total VRAM):
* GPU 0: NVIDIA GeForce RTX 4080 (15.99 GB)

You have already processed this model. You can show the results from the previous run, allowing you to export models or to run additional trials. Alternatively, you can ignore the previous run and start from
scratch. This will delete the checkpoint file and all results from the previous run.

? How would you like to proceed? Ignore the previous run and start from scratch

Loading model .\gemma-4-E2B-it...
* Trying dtype bfloat16...
model.language_model.layers.0.self_attn.o_proj -> Linear
model.language_model.layers.0.mlp.down_proj -> Linear
model.language_model.layers.1.self_attn.o_proj -> Linear
model.language_model.layers.1.mlp.down_proj -> Linear
model.language_model.layers.2.self_attn.o_proj -> Linear
model.language_model.layers.2.mlp.down_proj -> Linear
model.language_model.layers.3.self_attn.o_proj -> Linear
model.language_model.layers.3.mlp.down_proj -> Linear
model.language_model.layers.4.self_attn.o_proj -> Linear
model.language_model.layers.4.mlp.down_proj -> Linear
model.language_model.layers.5.self_attn.o_proj -> Linear
model.language_model.layers.5.mlp.down_proj -> Linear
model.language_model.layers.6.self_attn.o_proj -> Linear
model.language_model.layers.6.mlp.down_proj -> Linear
model.language_model.layers.7.self_attn.o_proj -> Linear
model.language_model.layers.7.mlp.down_proj -> Linear
model.language_model.layers.8.self_attn.o_proj -> Linear
(...)

Model made with heretic --batch-size 128 --dtypes bfloat16 --model google/gemma-4-E2B-it using heretic-llm from my gemma-4 branch: https://huggingface.co/MoonRide/gemma-4-E2B-it-heretic

@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented Apr 9, 2026

Doesn't that directly contradict the findings in #278? Their output contained things like

model.audio_tower.layers.{0...11}.self_attn.q_proj.linear.weight

Sorry for not digging deeper into this myself, I don't have time for this right now and I'm really confused by the different approaches between this PR and #285, both of which are claiming to be the better solution.

@MoonRide303
Copy link
Copy Markdown
Contributor Author

@p-e-w Old (current master / ara) code was using leaf module names, which PEFT then matched across entire model, also picking up Gemma4ClippableLinear wrappers from vision sub-model. When we use full module paths, it no longer happens.

@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented Apr 11, 2026

But where is the code that actually registers the inner Linear module under self_attn.o_proj?

For traditional models, we have this:

# Standard self-attention out-projection (most models).
with suppress(Exception):
try_add("attn.o_proj", layer.self_attn.o_proj) # ty:ignore[possibly-missing-attribute]

This code is required for abliteration to happen, because only modules returned by get_layer_modules will actually be targeted (see abliterate, which uses module.weight, which Gemma4ClippableLinear doesn't have).

For Gemma 4, the relevant module is not under self_attn.o_proj, but under self_attn.o_proj.linear. So how does this PR actually work?

@MoonRide303
Copy link
Copy Markdown
Contributor Author

Unwrapping via .linear would only be needed if we wanted to process Gemma4ClippableLinear, which are inside vision and audio modules. In case of text modules o_proj and down_proj are plain nn.Linear (see Gemma4TextAttention.o_proj and Gemma4TextMLP.down_proj). get_layers() resolves multimodal models to model.language_model.layers (no audio / vision towers), so get_layer_modules() will return only nn.Linear modules there. So for the modules heretic actually targets, the relevant module is under self_attn.o_proj / mlp.down_proj, NOT under .linear.

@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented Apr 12, 2026

Ah, indeed. We definitely don't want to touch the vision and audio towers.

You have convinced me. Please fix CI so this can be merged.

@MoonRide303
Copy link
Copy Markdown
Contributor Author

Please fix CI so this can be merged.

Done.

@p-e-w p-e-w merged commit e2c74bf into p-e-w:master Apr 12, 2026
4 checks passed
@p-e-w
Copy link
Copy Markdown
Owner

p-e-w commented Apr 12, 2026

Splendid, merged! Thanks for the PR, this is a solid improvement that I imagine will make Heretic more likely to support other, future architectures out of the box.

@MoonRide303 MoonRide303 deleted the gemma-4 branch April 12, 2026 07:28
MoonRide303 added a commit to MoonRide303/heretic that referenced this pull request Apr 12, 2026
@MoonRide303
Copy link
Copy Markdown
Contributor Author

@p-e-w One more thing worth doing for full gemma 4 support out of the box - bumping transformers in dependencies to 5.5.0 or newer (I tested on 5.5.1 and 5.5.3).

Side note: models saved with transformers 5.5.2 or newer might require updating model loader code (5.5.2 no longer saves shared weights). I'd start with 5.5.1 in deps, and when more downstream apps will update their dependencies (llama.cpp b8751+, etc.) it will be okay to update to newer releases.

0xA50C1A1 pushed a commit to 0xA50C1A1/heretic that referenced this pull request Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants