OpenAI vs Midjourney: DALL-E 3 enters the picture
🎨 It's brand new, it's chatty and it goes after Midjourney
I’ve been quite happy with the chief illustrator of this newsletter.
The AI image generator Midjourney had the job for the last few months. Despite the clunky interface and the fact that it sometimes takes many tries to get something good, I’ve grown to like it a lot…
But today it may be time for us to part ways 💔! The picture above was created using DALL-E 3. It’s the newest version of the image generation model developed by OpenAI (the parent company of ChatGPT).
The issue with Midjourney
Midjourney uses a simple text interface between the user and the image generation model.
You describe what you want (the prompt), this is then used to generate images. It sometimes works great, but often it doesn’t. Writing a prompt that will give you what you want is half science, half art. People develop these skills over many hours of trial and error.
For example, see this Midjourney attempt to generate a picture representing A merry crew of imaginary Parisian street artists, each looking like a cool and eccentric character.
Only one of the four pictures actually represents “a merry crew of artists”. The other three show artworks, not the artists themselves.
DALL-E holds hands with ChatGPT
Because humans are bad at writing the right prompts, DALL-E takes a different approach. It takes our initial prompt and then uses another AI system (ChatGPT) to rewrite it in a way that’s most likely to be correctly understood by the image generator.
Here’s how it looks like in practice. We start with the same prompt: Make a picture of a merry crew of imaginary Parisian street artists, each looking like a cool and eccentric character.
Behind the scenes, ChatGPT generates different, mode detailed prompts. For example:
Photo of a lively group of Parisian street artists standing together against a backdrop of a charming Parisian alley. One artist, a tall African woman, dons a beret and holds an oversized paintbrush, her face smeared with colorful paint. (…)
Illustration of a whimsical gathering of Parisian street artists. A Latina woman, with a scarf wrapped around her head, passionately paints on a canvas. Next to her, a South Asian man with a Mohawk plays a unique handmade instrument. (…)
These are then fed to DALL-E 3 to produce images. The results are much better!
Editing remains unsolved
Editing is a major unsolved problem for image generators. Good luck getting Midjourney to change a specific detail in a picture it made 🙃.
I was hoping DALL-E 3 could do better here, and maybe it does… But it still fails more often than it works. When I asked it to edit the DallE+ChatGPT picture above to remove the weird robot spider…
The result “without the spider element” was this:
Well, that’s not what I meant by “removing the spider element”.
I think I will keep using both Midjourney and DALL-E 3 for a while. It’s exciting to see how fast the technology improves. One day I may be able to remove that spider element from the Dali picture :).
In other news
More experiments with DALL-E 3 on Simon Willison’s blog
The official website. To use DALL-E 3, you need a premium subscription to ChatGPT.
📢 Do you know people who want to learn more about AI tools and could be interested in this newsletter? Share it with your friends and help us get to the first 100 subscribers 🥳 !
Postcard from Paris
Paris cannot decide whether it likes to be rainy-and-gloomy or sunny-and-stunning this fall. I feel the same 🤷.
Have a great week 💫,
– Przemek