Back

Experiments with tiny-sd

Since over a year I'm on the hunt for lightweight architectures and models for generative AI that I can run on my mid-end consumer hardware (GTX 2060 6GB VRAM/GTX 3040 8 GB VRAM).

Now recently I've trained a LoRA for stable diffusion XL (SDXL) on a logo dataset and published it on hugging face 🤗.

But this LoRA requires to run SDXL as a base model. Which is quite big and barely fits onto any of my GPUs. The only option I have is CPU offloading.

But that comes with a great speed decrease.

So I desperately wanted to find a smaller model!

Initial attempts

After discovering the distilled diffusion models by segmind I've tried out my casual trial prompts.

Here are the first results each with the default parameters via the inference UI:

Prompt: fluffy ball

0148aa78-c6d3-49ed-be15-e7adb6301954
0148aa78-c6d3-49ed-be15-e7adb6301954

Prompt: a woman in a field

bc0a3ad0-88dd-44f4-9f61-e6c7da6a9f81
bc0a3ad0-88dd-44f4-9f61-e6c7da6a9f81

Prompt: Super Mario sitting in private jet lounge and smoking a big joint with marijuana plants growing from his head, ultra realistic, HD, best quality, ~*~aesthetic~*~

d7b6cb50-737f-4924-a303-5b20495933d0
d7b6cb50-737f-4924-a303-5b20495933d0

Well, frankly apart from the fluffy ball the results are rather mediocre. I'm wondering if we can achieve something else maybe I should just try something easier.

For instance cartoons and cartoon-like pictures/paintings.

Recently I found the artist Jon Juarez through a post on mastodon. And I really like their work!

So I wanted to try out if I can generate something similar to their work :)

Prompt: painting with line shading of a cave

25f3b1a6-a68c-424f-9af5-101252f5edeb
25f3b1a6-a68c-424f-9af5-101252f5edeb

That's nice! I wonder how far I can go with this.

Because of that I felt obliged to train a LoRA to maybe improve the style of generated paintings.

LoRA training

After collecting 29 samples from an artist, I adjusted the dreambooth script that I used for my logo-LoRA for training of tiny-sd.

I set a learning rate of 1e-4 and trained it over 1.5k iterations.

After the training the script automatically uploads the model to the hub.

Note: you need to use the "magic phrase" ... by JON_JUAREZ ... to trigger the LoRA.

Here are some results:

Prompt: Pastel color painting with line shading by JON_JUAREZ of a dark cave

image (3)
image (3)

Prompt: Painting with line shading by JON_JUAREZ of a dark cave

image (1)
image (1)

Prompt: Wizard

image(15)
image(15)

Prompt: Wizard by JON_JUAREZ

image(14)
image(14)

Prompt: Landscape

image(16)
image(16)

Prompt: Landscape by JON& JUAREZ

image(17)
image(17)

One of the greatest findings of all of this is...

8lul76
8lul76

Even your toaster has more! And we're talking training, not inference.

For inference it's less than 1GB.

I wrapped the LoRA into a 🤗 hugging face space, you can find it below

Outlook

Apart from training a LoRA on a set of style examples, this Reddit post suggests to use them as class examples, potentially as style.

And then we can continue with LoRA merging. Infinite variations and possibilities are ahead us.

Demo

Keep in mind to include by JON_JUAREZ inro your prompt.