this post was submitted on 12 Jun 2023
4 points (100.0% liked)

Stable Diffusion

499 readers
1 users here now

Welcome to the Stable Diffusion community, dedicated to the exploration and discussion of the open source deep learning model known as Stable Diffusion.

Introduced in 2022, Stable Diffusion uses a latent diffusion model to generate detailed images based on text descriptions and can also be applied to other tasks such as inpainting, outpainting, and generating image-to-image translations guided by text prompts. The model was developed by the startup Stability AI, in collaboration with a number of academic researchers and non-profit organizations, marking a significant shift from previous proprietary models that were accessible only via cloud services.

founded 2 years ago
MODERATORS
 

I don't know if this community is intened for posts like this, if not, I'm sorry and I'll delete this post ASAP....

So, I play TTRPG (mostly online) and I'm a big fan of visual aids, so I wanted to create some chahrcter images for my charakter in the new campaign I'm playing in. I don't need perfect consistency as humans usually change a little over time and I only needed the character to be recognizable on a couple of images that are usually viewed on their own and not side by side, so nothing like the consistency you'd need for a comic book or something similar. So I decided to create a Textual Inversion following this tutorial and it worked way better than expected. After less than 6 epochs I had a consistency that was enough for my usecase and it didn't start to overfit when I stopped the training around epoch 50.

Generated image of a character wearing a black hoodie standing in a rundown neighborhood at night Generated image of the character wearing a black hoodie standing on a street Gerneated image of the character cosplaying as Ironman Generater image of the character cosplaying as Amos from the Expanse

Then my SO, who's playing in the same campaign asked me to do the same for their character. So we went through the motions and created and filtered the images. A first training attempt had the TI starting to overfit halfway through the second epoch, so I lowered the learning rate by factor five and started another round. This time the TI started overfitting somewhere around epoch 8 without reaching consistency before. The generated images alternate between a couple of similar yet distinguishable faces. To my eye the training images seem to have a simliar or higher quality than the images I used in the first set. Was I just lucky with my first TI and unlucky with the other two and simply should keep on trying or is there something I should change (like the learningrate that still seems high to me with 0.0002 judging from other machine learning topics)?

top 3 comments
sorted by: hot top controversial new old
[–] SaucyGoodness 1 points 2 years ago* (last edited 2 years ago) (1 children)

I can't really help with training textual inversions as I've never done it (and I think Loras are better anyway), but the absolute easiest way to get consistent faces is to just use a mix of celebrities in the prompt. If you have (David Tennant | Keanu Reeves) in there, it'll give you a pretty consistent character without having to bother with training anything. It's all a little bit dependant on model used and style, but realistically, it's the fastest and easiest way to do it.

Edit: not what you asked for, of course, but since you didn't seem to fussy about it, I figured I'd suggest it anyway.

[–] deathxbyxtaxes 2 points 2 years ago

That's a great tip, thanks for posting it. Seems to me both methods are useful, just use case dependent.

[–] BrianTheeBiscuiteer 1 points 2 years ago

I think TI is really a hit or miss method and I don't believe everything in existence can be represented as a TI. I tried 14 different sessions of trying to train on a face and at best it looked like a 1st cousin that was having an allergic reaction. I tried a hypernetwork and got much better accuracy on my first attempt, although very overfitted.

I've heard that Dreambooth is still the best for accuracy so I'll be by trying that next (you can make a DB model then extract a Lora from that).