Advice on style training #10787

Clement-Lelievre · 2025-02-13T10:21:24Z

Clement-Lelievre
Feb 13, 2025

Hi all,

I'm looking for advice on how to best train a LoRA on the following task: the LoRA should learn to represent black and white wireframe drawings seen from top (orthogonal view ie objects seen from top, no perspective).
I attach one example of image from my dataset depicting a desk in the above mentioned style.

So far I've mostly tried fine-tuning flux dev, varying lr, rank etc. to see what I can get; and the LoRA partially learns but definitely isn't great.

Do you have any recommendations of model/hyperparams? (if this task is feasible at all)

Thanks vm!
Happy to provide more details if needed

(maybe @asomoza ?)

asomoza · 2025-02-13T12:35:37Z

asomoza
Feb 13, 2025
Maintainer

can you post an image generated with your lora?

It's probably doable with any model but it depends on your definition of 'isn't great'. For example, there's a lora for ikea instructions trained in the base SDXL model which is fine if you don't consider the text.

About your decision to use Flux, I haven't tried to train a lora on it yet but for what I've seen and tested, Flux looses a lot of the quality when you finetune it, the loras I tested work fine but they degrade a lot the other stuff and also the model isn't that great for lines (for example anime) and it's really overtrained in professional photos with bokeh.

For example, take this lora for SDXL and this one for Flux which are for 3D wireframes, if you ask me, based in the demo images, I think the SDXL one is better.

Also how big if your dataset? Depending on the size, you will probably need to finetune a model that's really over trained in lines like the anime models. This probably needs a bigger dataset than a normal character lora.

I really can't help with the hyperparameters for Flux, but one good start would be to train the whole model and then extract loras in different ranks to test which one works better for your use case. Also as a good default start, you should use the same LR than what the base model used.

12 replies

asomoza Feb 14, 2025
Maintainer

I misunderstood what you said, I read it again and you were referring to the base training, don't really know about Flux but SDXL can do top down views without a problem if you fine tune it for it.

I did a quick search for a lora and I could generate these images really easy:

Didn't see anything similar in Flux, just some people that trained some top down map generators.

What you're trying to do shouldn't be that hard, you're just training with grayscale images and with straight lines, a lot easier than some of the other finetuning's like those weird styles like yarn or spaghetti, also I saw someone that trained a lora with dotted lines for drawing which seems a lot more complex to train than yours.

My guess is that since you're fixated in using Flux, you will need to teach it everything, first the top view, then the line drawings and then the conversion of the objects between for example, a front view and a top view.

Probably you won't be able to do this with dreambooth and will need to train a full lora or model for this, also take into account, that people mostly finetune loras with Flux for portrait photos of a person with a very simple pose, that is probably for a reason.

Clement-Lelievre Feb 14, 2025
Author

thanks for the reply. haha don't worry I'm not fixated with using flux, if another model is better suited to the task then I'll go for it.
So you're saying you found an existing SDXL LoRA that generated the top view images you shared?
I will look into dreambooth sdxl lora training

asomoza Feb 14, 2025
Maintainer

yeah, this one but as I told you, it's for a model that's not that safe to use but that is only trained with lines.

I used to never look at those models but I read and then tested (and corroborted) that they're really good for a lot of other stuff because of the same training. The problem is that even if you don't prompt it, you can sometimes get some nsfw results, so if that bothers you, you will need to train or find a model trained with lines that is sfw but because of the same, they lack the understanding of these newer models.

Same problem as base models with anatomy, SDXL, SD3, an all the newer models are really bad with anatomy because they weren't trained with "good anatomy" to avoid the backslash and to be stigmatized as nsfw.

Clement-Lelievre Feb 17, 2025
Author

update: I fine-tuned SDXL (still with Dreambooth LoRA, using this diffusers script) with the default params, and while the LoRA quickly learnt the style (white background, thin dark lines, wireframe style) it struggles with learning orthogonality, specifically drawing an object as seen from top (like the first image I posted, which comes fro my training dataset). Flux struggled with this too.
I tried to play with the hyperparams a bit but got no improvement.
Not sure if it is realistic to teach this "concept" (I guess it's a concept?) of orthogonality in a Dreambooth LoRA fine-tuning, I don't have enough experience of it.

asomoza Feb 18, 2025
Maintainer

yeah, this probably is not something you can teach the model with just dreambooth and more like a full lora or finetune.

I also have an update, I've seen a lot more people producing Flux loras lately (probably because of Flux Gym and there's been a couple of new ones that are using styles which seems good, at least in the demo images, so this makes me think that you will be able to train your idea with Flux.

Probably same as you have guessed already, not with dreambooth and with a larger dataset.

Clement-Lelievre · 2025-03-10T16:16:44Z

Clement-Lelievre
Mar 10, 2025
Author

hey @asomoza

I wanted to update you in case you're interested: I've come up with a way to remove one constraint in my training (the orthographic representation). So now, I am trying to teach the model the style shown on the image I displayed in my first message above. I have a balanced mix of all views in my dataset (diverse objects, shown in front, side or top view).

(I will then load these lora weights and use a controlnet in order to get this style AND following the structure of a depth map, but that's another story)

I couldn't make flux learn it, now trying with SDXL. Using this script with hyperparams defaults for now. As you can see, the training is happening to some degree, as validation inferences show a drawing style, and dark thick edges, but their style is far from the wireframe, minimalist drawing style of the dataset

2 replies

asomoza Mar 10, 2025
Maintainer

@Clement-Lelievre nice!, I still don't get the complete scope of what you're trying to do though, if you're going to use a controlnet for the orthographic representation, for me, that's all that there's to it so what do you still need to train the model?

If I grab your image from the first post and just use a controlnet with it:

lineart	special blend (edge)

Those results are even without playing with the brightness or contrast which probably will give better results.

For me the style it's just line drawings, what is that you need the model to train? For me the orthographic representation is pretty much the hard part of what you're trying to do and it's a cool project since it's something that humans take a long time to do.

Probably I'm still missing something on what you're doing but anyways, glad you got some progress and I think it's a really good learning experience and a good project too.

Clement-Lelievre Mar 11, 2025
Author

hi @asomoza
to clarify

in my use case, I will get depth maps (or normal maps if I want to), and from them, I need to produce a raster image that has:

the structure from the abovementioned depth/normal map, which will act as control image
the style from the dataset image I shared

I do not have access to a dataset of depth map/target image pairs so I cannot train a Controlnet.

Therefore, what I did to try and accomplish this task is:

finetune SDXL txt2img with Lora Dreambooth (using a dataset of 30 image-only mixed views, no captions) (I went up to 4000 steps)
load StableDiffusionXLControlNetPipeline and load the LoRA weights from the best checkpoint, to infer

While it works in the sense that:

there is definitely some training happening, the style is learnt (thicker dark contours, light gray inner edges, white background etc.) but not perfectly
control image stricture is well respected
the LoRA loading does not break the controlnet weights

it is still highly sensitive to params, and far behind expectations visually, see images

The key thing is, I need consistency between views of the same object (eg a car seen from front or from the side), and right now, even if the structure is well respected, the style is not learnt well enough and therefore there is no visual consistency across views. Likewise, just changing the seed produces a noticeable variation of the same style while I need the model to really enforce a single style

Clement-Lelievre · 2025-09-10T16:29:52Z

Clement-Lelievre
Sep 10, 2025
Author

hey @asomoza

a few months after, this issue is still current :)
I was using nanobanana to generate images of objects from top view in a technical drawing style, and even this praised model struggles to produce top views. It often produces perspectives instead, which are unusable for me because my use case is:
text prompt > image > vector (SVG) > insert vector on architect's floorplan

I see several possible reasons for the model's inability to generate accurate top views, including:

Lack of top views in the training set (even for us humans, some objects are not trivial to draw from an orthographic top view)
The prompt is almost contradictory, because many objects just look like a rectangle/oval from above (e.g., table, stool, fridge, etc.), and it seems like the model absolutely wants to show the object's features

See example images below of a washing machine: front and side views are fine, but then the top view is incorrect, it's like the model desperately wants to show it's a washing machine, while a mere rectangle would be fine instead of this

When the object prompted is well-known from top view (meaning well-known to us and to the training sets), like say a pool table, the model does fine. But when it is an object unfrequently depicted from top view (eg a fridge) then the model struggles. Sometimes it even generates a front view instead of a top view, as if it was completely giving up on the idea of respecting the prompt.

I want to ask you, in light of the past months of research, do you see any way forward to tackle this issue? I was thinking of fine-tuning flux kontext, wdyt?

thank you for your answer!

1 reply

asomoza Sep 23, 2025
Maintainer

I think this is a general problem with the training and models, if nano banana couldn't do it, probably no other model could without a lot of training. There's the new qwen-image-edit model which accepts multiple images, that model has a really good prompt adherence so maybe you can try it with a really good prompt and a couple of images, like for example, the source image and a another one showing how to do it.

Since your original post, I haven't seen any other lora or generation that does something similar, top down views are really hard to train and to understand to diffusion models it seems.

Uh oh!

Advice on style training #10787

Uh oh!

Uh oh!

Clement-Lelievre Feb 13, 2025

Replies: 3 comments · 15 replies

Uh oh!

asomoza Feb 13, 2025 Maintainer

Uh oh!

asomoza Feb 14, 2025 Maintainer

Uh oh!

Uh oh!

Clement-Lelievre Feb 14, 2025 Author

Uh oh!

asomoza Feb 14, 2025 Maintainer

Uh oh!

Uh oh!

Clement-Lelievre Feb 17, 2025 Author

Uh oh!

asomoza Feb 18, 2025 Maintainer

Uh oh!

Uh oh!

Clement-Lelievre Mar 10, 2025 Author

Uh oh!

asomoza Mar 10, 2025 Maintainer

Uh oh!

Uh oh!

Clement-Lelievre Mar 11, 2025 Author

Uh oh!

Clement-Lelievre Sep 10, 2025 Author

Uh oh!

Uh oh!

asomoza Sep 23, 2025 Maintainer

Clement-Lelievre
Feb 13, 2025

Replies: 3 comments 15 replies

asomoza
Feb 13, 2025
Maintainer

asomoza Feb 14, 2025
Maintainer

Clement-Lelievre Feb 14, 2025
Author

asomoza Feb 14, 2025
Maintainer

Clement-Lelievre Feb 17, 2025
Author

asomoza Feb 18, 2025
Maintainer

Clement-Lelievre
Mar 10, 2025
Author

asomoza Mar 10, 2025
Maintainer

Clement-Lelievre Mar 11, 2025
Author

Clement-Lelievre
Sep 10, 2025
Author

asomoza Sep 23, 2025
Maintainer