GitHub - zer0int/CLIP-Direct-Ascent-Synthesis: Like a CLIP + VQGAN. Except without a VQGAN.

Technically, a heavily modified fork of:

Direct Ascent Synthesis: Hidden Generative Capabilities in Discriminative Models

http://github.com/stanislavfort/Direct_Ascent_Synthesis
With emphasis on heavily modified. Alas, please open an Issue on me if you encounter one.

Like CLIP + VQGAN. Except without a VQGAN.

⭐ Update 23-FEB-2025

Add ability to skip layers in Text & Vision Encoder for generating images
Counting from the back of the transformer, -1 = last, -2 = penultimate, etc.
Examples:

Use CLIP-L penultimate (second-to-last) instead of final text encoder layer (like in SDXL!):

python clip-generate.py --deterministic --make_anti --manu_vit --manu_txt --model_name "OpenAI-ViT-L/14" --set_vit 1 --set_txt 2

Enable: --manu_vit & --manu_txt - skip layer (does nothing without Enable): --set_vit --set_txt
To also skip final layer normalization before projection: --skip_ln_vit and --skip_ln_txt
To reduce batch_size (for VRAM) and augs_cp (quality vs. speed), e.g.: --batch_size 16 & --augs_cp 32
For all models. Default OpenAI-ViT-B/32:
python clip-generate.py --deterministic --make_anti --manu_vit -set_vit 1 --set_txt 2

🤖 Also recommended: (layer 20 (of 0-23, vision), layer 11 (of 0-11, text):

python clip-generate.py --deterministic --batch_size 16 --augs_cp 32 --make_anti --manu_vit --manu_txt --model_name "OpenAI-ViT-L/14" --set_vit 4 --set_txt 1

⭐ First commit 21-FEB-2025

The original author's code offers:

Text to image generation
"Style" transfer
Image reconstruction from its CLIP embedding

This repo adds:

Gradient Ascent on the Text Embeddings (use CLIP's own opinion about image as text prompt)
Minimize cosine similarity (get an "anti-cat" opinion / antonym for a cat image OR a cat text prompt)
Use 4. and 5. to generate images Direct Ascent Synthesis
For a given input image, visualize the Neuron (MLP Feature) with the highest activation value.
Add one (or all) layer's features / "neurons" to the image stack for processing
Can be, in essence, a self-sustained loop of making EVERYTHING out of CLIP. No human input.
...And many more options & features!

Quick start fun, uses human text prompts:

python clip-generate.py --use_neuron --make_anti

Uses the same default (cat) image as --img0, but gets a CLIP opinion about it (no human text input):

python clip-generate.py --use_neuron --make_anti --use_image images/cat.png

Adds ALL features ('neurons') as images, changes primary image, changes text prompt 1:

python clip-generate.py --all_neurons --img0 images/eatcat.jpg --txt1 "backyardspaghetti lifestyledissertation"

Loads a second CLIP model (open_clip), makes plots, CLIP opinion, adds second image --img1:

python clip-generate.py --custom_model2 'ViT-B-32' 'laion2b_s34b_b79k' --make_plots --make_lossplots --img1 dogface.png

Loads fine-tuned OpenAI/CLIP model as primary, sets deterministic backends. OpenAI models must start with "OpenAI-".

python clip-generate.py --model_name "OpenAI-ViT-L/14" "mymodels/finetune.pt" --batch_size 16 --deterministic

Please see code in clip-generate.py for more details.

There's a lot. But I left you lots of comments, too!
python clip-generate.py --help for a quick review.

Skip Text Encoder layers until just plugging the first layer into projection - a 1-layer #CLIP text encoder!

ViT-B/32: fails
ViT-L/14: Relentlessly just makes something else.🦾🤖
Banana Cat incomprehensible, make: M + 🍟🤡 and 🕑💥🚶🎑🏡⚽️🧦🌟🔢

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
images		images
README.md		README.md
clip-generate.py		clip-generate.py
clip-simple-attack.py		clip-simple-attack.py
cliptools.py		cliptools.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Direct Ascent Synthesis: Hidden Generative Capabilities in Discriminative Models

Like CLIP + VQGAN. Except without a VQGAN.

⭐ Update 23-FEB-2025

⭐ First commit 21-FEB-2025

Please see code in clip-generate.py for more details.

Skip Text Encoder layers until just plugging the first layer into projection - a 1-layer #CLIP text encoder!

A striking difference in complexity for a 12 layer ViT vs. 24 layer ViT:

About

Releases

Packages

Languages

zer0int/CLIP-Direct-Ascent-Synthesis

Folders and files

Latest commit

History

Repository files navigation

Direct Ascent Synthesis: Hidden Generative Capabilities in Discriminative Models

Like CLIP + VQGAN. Except without a VQGAN.

⭐ Update 23-FEB-2025

⭐ First commit 21-FEB-2025

Please see code in clip-generate.py for more details.

Skip Text Encoder layers until just plugging the first layer into projection - a 1-layer #CLIP text encoder!

A striking difference in complexity for a 12 layer ViT vs. 24 layer ViT:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages