Conversation
There was a problem hiding this comment.
Code Review
This pull request updates the LoRA adapter initialization to use full module names for target identification and improves the logging of target types. A high-severity issue was identified where the sorted list of target modules is immediately overwritten by an unsorted list, which should be corrected to ensure deterministic behavior.
|
What is the difference between this PR and #285? Which approach do you think is better? |
|
@p-e-w About the same idea, I didn't notice this #285 exist (it wasn't linked in the issue). That said, I am not sure about that change in try_add was in case of #285 (it doesn't look necessary to me, #287 worked fine without it), and then adjusting "LoRA adapters initialized" output in case of #287 is good thing to have (otherwise you will get dump of a lot of layers), so I'd lean towards this variant. |
|
But how can this PR work if it doesn't extract the |
|
But selected modules are Linear, already. After adding Model made with |
|
Doesn't that directly contradict the findings in #278? Their output contained things like Sorry for not digging deeper into this myself, I don't have time for this right now and I'm really confused by the different approaches between this PR and #285, both of which are claiming to be the better solution. |
|
@p-e-w Old (current master / ara) code was using leaf module names, which PEFT then matched across entire model, also picking up Gemma4ClippableLinear wrappers from vision sub-model. When we use full module paths, it no longer happens. |
|
But where is the code that actually registers the inner For traditional models, we have this: Lines 355 to 357 in 077e31f This code is required for abliteration to happen, because only modules returned by For Gemma 4, the relevant module is not under |
|
Unwrapping via |
|
Ah, indeed. We definitely don't want to touch the vision and audio towers. You have convinced me. Please fix CI so this can be merged. |
Done. |
|
Splendid, merged! Thanks for the PR, this is a solid improvement that I imagine will make Heretic more likely to support other, future architectures out of the box. |
|
@p-e-w One more thing worth doing for full gemma 4 support out of the box - bumping transformers in dependencies to 5.5.0 or newer (I tested on 5.5.1 and 5.5.3). Side note: models saved with transformers 5.5.2 or newer might require updating model loader code (5.5.2 no longer saves shared weights). I'd start with 5.5.1 in deps, and when more downstream apps will update their dependencies (llama.cpp b8751+, etc.) it will be okay to update to newer releases. |
fix proposal for #278
tested locally on gemma-4-E2B-it and Llama-3.2-3B-Instruct