- Technically, a heavily modified fork of:
- http://github.com/stanislavfort/Direct_Ascent_Synthesis
- With emphasis on heavily modified. Alas, please open an Issue on me if you encounter one.
- Add ability to skip layers in Text & Vision Encoder for generating images
- Counting from the back of the transformer, -1 = last, -2 = penultimate, etc.
- Examples:
Use CLIP-L penultimate (second-to-last) instead of final text encoder layer (like in SDXL!):
python clip-generate.py --deterministic --make_anti --manu_vit --manu_txt --model_name "OpenAI-ViT-L/14" --set_vit 1 --set_txt 2
- Enable:
--manu_vit
&--manu_txt
- skip layer (does nothing without Enable):--set_vit
--set_txt
- To also skip final layer normalization before projection:
--skip_ln_vit
and--skip_ln_txt
- To reduce batch_size (for VRAM) and augs_cp (quality vs. speed), e.g.:
--batch_size 16
&--augs_cp 32
- For all models. Default OpenAI-ViT-B/32:
python clip-generate.py --deterministic --make_anti --manu_vit -set_vit 1 --set_txt 2
🤖 Also recommended: (layer 20 (of 0-23, vision), layer 11 (of 0-11, text):
python clip-generate.py --deterministic --batch_size 16 --augs_cp 32 --make_anti --manu_vit --manu_txt --model_name "OpenAI-ViT-L/14" --set_vit 4 --set_txt 1
The original author's code offers:
- Text to image generation
- "Style" transfer
- Image reconstruction from its CLIP embedding
This repo adds:
- Gradient Ascent on the Text Embeddings (use CLIP's own opinion about image as text prompt)
- Minimize cosine similarity (get an "anti-cat" opinion / antonym for a cat image OR a cat text prompt)
- Use 4. and 5. to generate images Direct Ascent Synthesis
- For a given input image, visualize the Neuron (MLP Feature) with the highest activation value.
- Add one (or all) layer's features / "neurons" to the image stack for processing
- Can be, in essence, a self-sustained loop of making EVERYTHING out of CLIP. No human input.
- ...And many more options & features!
Quick start fun, uses human text prompts:
python clip-generate.py --use_neuron --make_anti
Uses the same default (cat) image as --img0, but gets a CLIP opinion about it (no human text input):
python clip-generate.py --use_neuron --make_anti --use_image images/cat.png
Adds ALL features ('neurons') as images, changes primary image, changes text prompt 1:
python clip-generate.py --all_neurons --img0 images/eatcat.jpg --txt1 "backyardspaghetti lifestyledissertation"
Loads a second CLIP model (open_clip), makes plots, CLIP opinion, adds second image --img1:
python clip-generate.py --custom_model2 'ViT-B-32' 'laion2b_s34b_b79k' --make_plots --make_lossplots --img1 dogface.png
Loads fine-tuned OpenAI/CLIP model as primary, sets deterministic backends. OpenAI models must start with "OpenAI-".
python clip-generate.py --model_name "OpenAI-ViT-L/14" "mymodels/finetune.pt" --batch_size 16 --deterministic
- There's a lot. But I left you lots of comments, too!
python clip-generate.py --help
for a quick review.
Skip Text Encoder layers until just plugging the first layer into projection - a 1-layer #CLIP text encoder!
- ViT-B/32: fails
- ViT-L/14: Relentlessly just makes something else.🦾🤖
- Banana Cat incomprehensible, make: M + 🍟🤡 and 🕑💥🚶🎑🏡⚽️🧦🌟🔢