DALL-E is also able to generate extraordinary images with very minimal prompts:




How to use minimalist prompts to generate things that do not exist in reality

Sometimes this way can generate things that are very difficult to get other prompts. An example of this is a coffee cup in the shape of a potato. What happens for most variations is that we get a potato in a coffee cup


One strategy now is to abandon thinking in coherent concepts. Instead of a coffee cup made of a potato, we look for an image in which different abstract properties are mixed.


And already we have two images that are exactly what we wanted.

Why does this work?

I think of it this way: Navigating with text in latent space. So in the space the connection of image and text is represented.

If we enter “Woman” we get all the information associated with images related to the word woman. For "Woman, by David Lazar" we get the cut/mix. In "by David Lazar" we already have all the information for perfect photography. We don't need to input camera lens, light, etc. On the contrary, too much information interferes with each other and creates noise.

<aside> 💡 So the goal is to find as few vectors as possible in latent space, each of which encodes an extremely large amount of information.


In this sense, we can create images that contain certain elements or have certain properties by entering them as pure and unconnected as possible:


Find things by trying at random is very credit intensive, which is why having something like an approximative theory for good prompts would be good. My working idea is some like:

<aside> 💡 Write a prompt where the intersection of the average result of each part is close to your desired image. At the same time try to keep the parts as uncorrelated as possible so that the noise cancels each other.
