Stijn Windig Concept Art

AI sketches with VQGAN and CLIP for concept art

Tutorial / 05 March 2022

Playing with Disco Diffusion

VQGAN+CLIP is a neural network architecture that builds upon the CLIP architecture published by OpenAI in January 2021.
Unlike artbreeder, which lets you play with a whole bunch of different ways of input, VQGAN+CLIP is is a text-to-image model that generates images of variable size given a set of text prompts and some other parameters.

Like many, I've been playing with the 'Disco Diffusion v4.1' model that's been doing the rounds lately.

It's both a lot of fun, and frustrating at the same time. I've been mildly obsessed with it for the last couple of weeks, and I thought it could be useful for other concept artists to log my findings about its uses.

First off, it's kind of insane that this exists at all. It's like a weird machine that taps into the collective human subconscious and brings back distorted dream-imagery. At first glance, it looks like it's able to generate epic concept art easily.
At second glance, however, once you've downloaded the image, enlarged it, and start painting on top, there's this super disappointed feeling, because it turns out that, even though it looks like an epic image from a distance, there is actually nothing there.

To add to that, there's this weird effect, for me anyway, that, once you start defining the bits in the image that are indistinct, the image becomes less interesting. Turns out that the ambiguousness is actually the strength, rather than the weakness of the image. Trying to 'finish' the images, I felt disappointed with every brushstroke, as I could see the image becoming more defined, but less fascinating.

When I first discovered it, I tried using it on a client job featuring an environmental concept piece with a lot of architectural elements in it, thinking, 'this will totally help me finish this painting quickly'... WRONG. You have to paint over everything, and not only that, you'll have to define every pixel all over again because, even though it may look like a mountain landscape with a city in it, there are no actual correct houses, buttresses, vegetation, and whatnot, there's only the suggestion. I painted for a week on an image that should have been done in one or 2 days. I would have been faster in 3D, or even just 2D. ( I can't show the image because NDA)

So that's the weakness. It's not properly capable of generating a finished work (yet), if you're going for something specific. It's no good at specific.
On top of that, It's no good at people, animals, cars, perspective, or anything that needs really specific features. I'm sure this will change quite soon, as these models are evolving quickly.

It's hella good at suggestion though. After the painting debacle, I figured out that it's much better used as an idea generator.
Give it the prompt: "fantasy city on a sunny day, game of thrones, massive castle" and it'll return something that at least sparks the imagination and can be used to paint on top of, as a sketch.

Final sketch:

Generated image:

Similarly, "massive cathedrals with 8 legs walking through the fire" generates a bunch of crappy images, and a bunch of rather interesting ones.

For this one, I also used an 'Init image", which makes it take your initial image and generate stuff based on top of it.

Final sketch:

Generated Image:

It's also fun to throw in an artists name, like 'in the style of' Rembrandt, Richard Schmid, or Bekzinski. (neural networks seems to have a real penchant for Bekzinski...) It seems like it will generate an infinite amount of variations. I'm not sure, but I've yet to see it repeat itself.

I've found it's a nice tool to spark ideas, and additionally, it got me out of the technical mindset that I was in as a concept artist, using mostly 3D. After figuring out geometry nodes in Blender and proper transparency in z-passes and such, It's refreshing to go back to basics and just paint without thinking too much about the end result. Generating weird shit with the Disco diffusion model, and painting on top of it is quite liberating. You don't have to be precious with it, because the images are 'free'..

Some tips:

Tip 1: you can save 'partial' images, which are not finished yet, but are sometimes better than the final product, because they don't have as much detail to disctract you from the idea that the image sparks.
Tip 2: generate a whole bunch of images from the same prompt, and just photobash 'em!
Tip 3: Draw/paint/3D something and use it as an 'Init image" to generate iteration.
Tip 4: Prompts like 'trending on artstation' and 'rendered in octane' seem to affect the rendering style quite a lot.

I hope that this is useful for some folks,
Cheers!
~Stijn

Resources:

The colab I'm using now:
https://colab.research.google.com/drive/1sHfRn5Y0YKYKi1k-ifUSBFRNJ8_1sa39?usp=sharing#scrollTo=lCLMxtILyAHA

Tutorials:

Quick-eyed sky's tutorial that got me on the trail:
https://www.youtube.com/watch?v=MJwY10hnwf4&t=336s
Annis Naeem's tutorial:
https://www.youtube.com/watch?v=wU0DOWD3m6U
This is a particularly useful reddit thread:
https://www.reddit.com/r/bigsleep/comments/p15fis/tutorial_an_introduction_for_newbies_to_using_the/
Nice facebook group:
https://www.facebook.com/groups/procgenart