This project was completed as part of CSC413 at the University of Toronto.
The training-scripts directory contains the code to train the fine-tuned models, with a subdirectory for each fine-tuning method.
The fine-tuned models directory contains some of the final outputs of the fine-tuning methods, at least for textual inversion and LoRA. The outputs for Dreambooth were not included due to their size.
The target-images directory contains the original target images that the fine-tuned models were trained with.
The generated-images directory contains the images generated by each fine-tuned model. They are grouped by fine-tuning method and by prompt type, where "basic" is the default prompt "a photo of <*>" while the other groups, like "forest", are the more complex prompts like "a photo of <*> in a forest".
The target-complex-images directory contains images generated by each fine-tuned model for the more complex prompts that did not include the placeholder token (e.g., "a photo of a in a forest"). These images were used to evaluate how well the fine-tuned models were able to incorporate other features of prompts aside from just the learned concepts.
The evaluation-scripts directory contains several scripts for running our evaluation process, including run_inference for generating the actual output images, clip_distance for performing the CLIP similarity comparisons, and fid for performing the Fréchet inception distance calculation.
Finally, evaluation-results contains the actual results from our evaluations.
All of these scripts can be run provided that the command pip install -r requirements.txt has been run. Python 3.10 is required due to the use of PyTorch 2.0.