SDXL 1.0 vs. Midjourney 5.2: How Do They Compare?
A look at how the two current best-in-class text-to-image models fare against each other.
Yo!
I’m back, and so are my regular Thursday posts.
Today also happens to be my 10th wedding anniversary, and what better way to celebrate it than to publish a fun post with lots of images?
So by popular demand…
…today I’ll be comparing SDXL 1.0 and Midjourney 5.2.
Each represents the latest and greatest that Stability AI and Midjourney currently have to offer.
But which one has the edge?
Let’s find out!
Painting types
To begin with, I wanted to see how each model handled the three major types of painting: portrait, landscape, and still life. (Inspired by my ancient experiment with Stable Diffusion 1.0.)
Remember to click an image to see it in full size and the right aspect ratio
Portrait
“Portrait of a woman at twilight”
I’d say both models have understood the task. Midjourney perhaps has a somewhat less painterly vibe. No clear winners here, as far as I’m concerned.
Winner: Tie
Landscape
“Painting of an alien world, purple sunset”
Both did as requested, but Midjourney definitely wins this one when it comes to the level of detail, sharpness, and overall composition.
Winner: Midjourney 5.2
Still life
“Still life painting, pineapple on a cutting board”
Whoa!
Who copied whose homework here? These are uncannily similar.
I’m tempted to give a slight edge to SDXL for making the image look a tiny bit more like a painting.
Winner: SDXL 1.0 (but barely)
Photographic images
Now let’s see how well each model does with different types of photography.
Remember to click an image to see it in full size and the right aspect ratio
Vintage photo
“Vintage photo of a Victorian family building a spaceship”
Both get points for prompt comprehension, but I think Midjourney’s final result is way better. The image is sharper, the details are more fleshed out, and there are fewer random artifacts (like washed out faces and partially complete people).
Winner: Midjourney 5.2
Macro photography
“Cufflinks, macro”
Both images accurately reflect the prompt. Midjourney 5.2 cufflinks are perhaps a bit too fanciful, while the SDXL 1.0 image has the opposite problem of being too bland.
But I’ll have to dock points from SDXL 1.0 for language understanding. My original prompt for this challenge was “button, macro” which for some odd reason produced insect shots in SDXL:
Midjourney, on the other hand, had no issues understanding what a button is:
Winner: Midjourney 5.2
Street photography
“Close-up shot of a man in a crowd, street photography, bokeh”
Nice!
Both did admirably, down to accurately simulating the “bokeh” effect.
I had to do a few extra rolls with SDXL 1.0 (the first ones returned faces where eyes looked “off”), but the final results are pretty comparable in terms of quality.
Winner: Tie
Nature photography
“Elephants grazing, wildlife photography, National Geographic”
Again, both images accurately reflect the prompt, but Midjourney’s simply better at creating a more vibrant and lively scene with fewer artifacts.
Winner: Midjourney 5.2
Fashion photography
“Ozzy Osbourne in a Gucci outfit, fashion photography”
We have a pattern developing, folks!
Both models follow the prompt, but Midjourney’s composition is more fleshed out and captivating. Plus SDXL 1.0 absolutely butchered poor Ozzy’s hands, while Midjourney 5.2 did a pretty great job with the hands and fingers.
Winner: Midjourney 5.2
Art mediums
The next challenge is to see how well SDXL 1.0 and Midjourney 5.2 handle different materials for drawing and painting.
Remember to click an image to see it in full size and the right aspect ratio
Oil painting
“Medieval village, oil painting”
Those sure look like medieval villages of some sort.
Points go to SDXL 1.0 for respecting the prompt here: The image is more immediately recognizable as a painting, while Midjourney’s feels like a still from a video game.
Winner: SDXL 1.0
Acrylic painting
“Cabin in the woods, acrylic paint”
This is going to be a tie. I may personally prefer the Midjourney image, but there’s nothing at all wrong with the SDXL one.
Winner: Tie
Watercolor painting
“Wildflower meadow, watercolor painting”
Here we go with the plagiarism again!
Both are lovely, colorful images that follow the prompt and look surprisingly similar.
Winner: Tie
Pencil sketch
“Hovercraft, pencil sketch”
This one’s a bit of a paradox to judge.
The Midjourney 5.2 image is clearly more detailed and well-realized, but that’s precisely what takes away from the “pencil sketch” feel. It’s a bit too overengineered for a quick pencil drawing. SDXL instantly screams “pencil sketch.”
Winner: SDXL 1.0
Charcoal drawing
“Alien fish, charcoal drawing”
I was going to mock Midjourney for the white-on-black image (the vast majority of images it returned were white-on-black), but then I learned that white charcoal drawings are absolutely a thing.
Still, I’m again tempted to give SDXL 1.0 points for simplicity, as Midjourney just does too much with the composition, which ruins the “drawing” vibe.
Winner: SDXL 1.0 (but barely)
Long prompts
For this one, I wanted to see how the two models dealt with complex scene descriptions containing multiple details.
Fun fact: To go full-tilt on AI, I had ChatGPT dream up the prompts for this challenge.
Remember to click an image to see it in full size and the right aspect ratio
“Ruined skyscrapers pierce a blood-red sky. Overgrown vegetation reclaims the streets. In the distance, a lone figure with a makeshift backpack stands atop a crumbled overpass, gazing out.”
I like both images quite a bit.
Also, each model succeeds in following most directions, with the notable exception of putting the person “in the distance.”
But Midjourney has the edge here for the more dramatic and striking image and for including the “crumbled overpass” described in the prompt.
Winner: Midjourney 5.2
“Bustling Victorian-era docks with airships floating above. Steam-powered cranes load crates onto wooden ships. Street vendors sell curious gadgets, and cobbled streets are alive with clockwork creatures.”
Oof!
Lots of details from the prompt have been overlooked by both models, from the steam-powered cranes to the cobbled streets to the clockwork creatures.
SDXL 1.0 nudges ahead in this one for including an airship and for the more authentic representation of a ship in a dock.
Winner: SDXL 1.0
“Multi-colored tents sprawl across a low-gravity moon. Various extraterrestrial species barter with holographic currency. Stalls display glowing fruits, mysterious relics, and levitating pets.”
I’m not seeing extraterrestrials, identifiable mysterious relics, or levitating pets in either image. But I do like the more “complete” look of the Midjourney output. I also feel it captures the intended vibe of the prompt better.
Not the most clearcut win, but it counts.
Winner: Midjourney 5.2 (but barely)
“On a rugged cliff edge, an old stone lighthouse stands tall against a tempestuous sea. Waves crash ferociously at its base, while its beacon cuts a solitary ray of light through the thick fog, guiding unseen ships to safety.”
There’s that copy-paste action again.
Both models do well with the details.
Stone lighthouse? Check!
Tempestuous sea with ferociously crashing waves? Check!
Ray of light cutting through fog? Basically check!
Ladies and gentlemen, we have another tie.
Winner: Tie
Abstract prompts
This section is more of a just-for-fun digression than a serious test.
I wanted to try a few random and abstract prompts, and since I’m a big boy now, I went ahead and did it without asking for anyone’s permission!
Remember to click an image to see it in full size and the right aspect ratio
Intangible concept
“Infinity”
I mean, come on.
We all agree Midjourney 5.2 kills it, right?
There’s nothing technically wrong with SDXL’s accurate symbol for infinity, but the Midourney image is just incredible and takes us on a wild ride.
Winner: Midjourney 5.2
Emojis
Emoji combos were some of my very first beginner-friendly Midjourney prompt recommendations back in December last year.1 So of course I had to try at least one!
“🌈👽”
Uh!
I dunno.
The SDXL 1.0 image is very basic but it does kind of have both the rainbow and the otherworldly alien aspects included. Midjourney spit out a grid with four images of women where only one was tangentially related to an alien thanks to her shiny sci-fi outfit:
We’ll call this a draw.
Winner: Tie
Made-up word
For our final challenge, I asked ChatGPT to make up 10 non-existent words…
…and picked one that sounded the most promising for art-generation purposes:
“Flonstrance”
I actually kind of like the abstract nature of the SDXL 1.0 image. Definitely has a “flonstrance” vibe to it, for whatever that’s worth.
There’s no doubt that Midourney is more polished and visually impressive, but I’m going to have to start subtracting points for MJ constantly defaulting to images of women when it doesn’t quite know what to do with a prompt:
Winner: SDXL 1.0
Observations
Time for some top-level conclusions.
Midjourney 5.2 remains the better overall model for vibrant, sharp, and visually striking pictures. Its output also tends to be more fully realized while SDXL 1.0 typically has more of an unpolished, work-in-progress quality. Finally, Midjourney 5.2 is the clear frontrunner when it comes to photographic and realistic results.
At the same time, SDXL 1.0 is often better at faithfully representing different art mediums. Midjourney images are just a bit too polished and detailed to pass for paintings or drawings. So in certain circumstances, the more minimalistic nature of SDXL 1.0 output serves it well.
Methodology (aka the “boring” part)
To make sure my comparison wasn’t polluted by any under-the-hood tweaks from third-party generators, I went to the source for each model:
The official Discord server for Midjourney
The official Discord server for SDXL 1.0
For SDXL 1.0, I’ve kept the default settings and didn’t apply any special styles.
The point was to let the text prompts themselves do the heavy lifting and see how each model does with a minimum of additional tweaking. I feel this makes for a fair comparison, especially since Stability AI claim the new SDXL 1.0 works with short prompts and no longer requires extra qualifiers to perform well:
I used the same aspect ratio for each one-to-one comparison.
Finally, I always tried to pick the best image of the grid for each model2. Granted, this selection is bound to be subjective, but hey, what are you gonna do?
Over to you…
Do you agree with my opinions about each image and my overall conclusions? Or would you rate the images differently?
Have you had the chance to experiment with the two models and discover other interesting points of comparison that I might have overlooked?
I’d love to hear your thoughts. Leave a comment on the site or shoot me an email.
For the record, I think Midjourney V4 is still the best choice for accurate results based on emoji combinations. The new V5.2 tends to ignore emoji aspects and defaults to images of women.
Midjourney generates four images per prompt while the SDXL bot only generates two. To balance this out, I rolled each prompt twice for SDXL to have four images to choose from.
Neat! It seems like Midjourney's biggest "value add", if you can call it that, is that everything is beautiful. That's fantastic if you're just trying to create something that looks good, but that can actually get in the way (as you rightly point out) if you're looking for specific stylization.