this post was submitted on 18 Jun 2023
8 points (90.0% liked)

Stable Diffusion

499 readers
6 users here now

Welcome to the Stable Diffusion community, dedicated to the exploration and discussion of the open source deep learning model known as Stable Diffusion.

Introduced in 2022, Stable Diffusion uses a latent diffusion model to generate detailed images based on text descriptions and can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by text prompts. The model was developed by the startup Stability AI, in collaboration with a number of academic researchers and non-profit organizations, marking a significant shift from previous proprietary models that were accessible only via cloud services.

founded 1 year ago
MODERATORS
 

I understand that, when we generate images, the prompt itself is first split into tokens, after which those tokens are used by the model to nudge the image generation in a certain direction. I have the impression that the model gets a higher impact of one token compared to another (although I don't know if I can call it a weight). I mean internally, not as part of the prompt where we can also force a higher weight on a token.

Is it possible to know how much a certain token was 'used' in the generation? I could empirically deduce that by taking a generation, stick to the same prompt, seed, sampling method, etc. and remove words gradually to see what the impact is, but perhaps there is a way to just ask the model? Or adjust the python code a bit and retrieve it there?

I'd like to know which parts of my prompt hardly impact the image (or even at all).

top 3 comments
sorted by: hot top controversial new old
[–] randon31415 5 points 1 year ago

At the bottom of stable diffusion, there is a script called x/y/z. One of the subtypes is the prompt S/R. Let's say your prompt was 'dog with orange hair' . You put in 'dog, cat, mouse, human' into the S/R text box. This will search your prompt for the first word (or set of words: I like switching out artists names) and generate a picture swapping out the first word for the next in the list. You can even do another one where you type in 'orange, red, blue, green' and you can get a grid of pictures with the first one a orange dog, and the other corner a green human.

[–] [email protected] 2 points 1 year ago (1 children)

@polystruct

Im pretty sure there is an extension that does something like that....

This is what i was thinking of: https://github.com/AUTOMATIC1111/stable-diffusion-webui/wiki/Extensions#daam

Not sure if its still supported though, the git repo has been set read-only.

[–] polystruct 2 points 1 year ago

Thanks! I initially didn't think that such extensions would be what I am looking for (they show which parts of the image are heavily influenced by the prompt), but they do give a visual clue as what parts are definitely important to keep in the prompt, whereas image areas that are hardly getting attention might tell me which parts of the prompt are less impactful.

And it gives me a few repo's to dig through the code. Older ones (archived/read-only) are fine, they are often easier to read than the more optimized (but also feature-rich) code that exists nowadays.