SDXL 1.0 vs. Midjourney 5.2: How Do They Compare?

A look at how the two current best-in-class text-to-image models fare against each other.

Aug 10, 2023

Yo!

I’m back, and so are my regular Thursday posts.

Today also happens to be my 10th wedding anniversary, and what better way to celebrate it than to publish a fun post with lots of images?

So by popular demand…

…today I’ll be comparing SDXL 1.0 and Midjourney 5.2.

Each represents the latest and greatest that Stability AI and Midjourney currently have to offer.

But which one has the edge?

Let’s find out!

Painting types

To begin with, I wanted to see how each model handled the three major types of painting: portrait, landscape, and still life. (Inspired by my ancient experiment with Stable Diffusion 1.0.)

Remember to click an image to see it in full size and the right aspect ratio

Portrait

“Portrait of a woman at twilight”

I’d say both models have understood the task. Midjourney perhaps has a somewhat less painterly vibe. No clear winners here, as far as I’m concerned.

Winner: Tie

Landscape

“Painting of an alien world, purple sunset”

SDXL vs Midjourney: Painting of an alien world with a purple sunset — SDXL 1.0 (left) vs. MJ 5.2 (right)

Both did as requested, but Midjourney definitely wins this one when it comes to the level of detail, sharpness, and overall composition.

Winner: Midjourney 5.2

Still life

“Still life painting, pineapple on a cutting board”

Whoa!

Who copied whose homework here? These are uncannily similar.

I’m tempted to give a slight edge to SDXL for making the image look a tiny bit more like a painting.

Winner: SDXL 1.0 (but barely)

Photographic images

Now let’s see how well each model does with different types of photography.

Remember to click an image to see it in full size and the right aspect ratio

Vintage photo

“Vintage photo of a Victorian family building a spaceship”

Both get points for prompt comprehension, but I think Midjourney’s final result is way better. The image is sharper, the details are more fleshed out, and there are fewer random artifacts (like washed out faces and partially complete people).

Winner: Midjourney 5.2

Macro photography

“Cufflinks, macro”

Both images accurately reflect the prompt. Midjourney 5.2 cufflinks are perhaps a bit too fanciful, while the SDXL 1.0 image has the opposite problem of being too bland.

But I’ll have to dock points from SDXL 1.0 for language understanding. My original prompt for this challenge was “button, macro” which for some odd reason produced insect shots in SDXL:

Midjourney, on the other hand, had no issues understanding what a button is:

Winner: Midjourney 5.2

Street photography

“Close-up shot of a man in a crowd, street photography, bokeh”

Street photo of a man in a crowd, by SDXL 1.0 and MJ 5.2 — SDXL 1.0 (left) vs. MJ 5.2 (right)

Nice!

Both did admirably, down to accurately simulating the “bokeh” effect.

I had to do a few extra rolls with SDXL 1.0 (the first ones returned faces where eyes looked “off”), but the final results are pretty comparable in terms of quality.

Winner: Tie

Nature photography

“Elephants grazing, wildlife photography, National Geographic”

Elephants grazing by MJ and SDXL — SDXL 1.0 (left) vs. MJ 5.2 (right)

Again, both images accurately reflect the prompt, but Midjourney’s simply better at creating a more vibrant and lively scene with fewer artifacts.

Winner: Midjourney 5.2

Fashion photography

“Ozzy Osbourne in a Gucci outfit, fashion photography”

Ozzy Osbourne in a Gucci outfit by MJ and SDXL — SDXL 1.0 (left) vs. MJ 5.2 (right)

We have a pattern developing, folks!

Both models follow the prompt, but Midjourney’s composition is more fleshed out and captivating. Plus SDXL 1.0 absolutely butchered poor Ozzy’s hands, while Midjourney 5.2 did a pretty great job with the hands and fingers.

Winner: Midjourney 5.2

Art mediums

The next challenge is to see how well SDXL 1.0 and Midjourney 5.2 handle different materials for drawing and painting.

Remember to click an image to see it in full size and the right aspect ratio

Oil painting

“Medieval village, oil painting”

Those sure look like medieval villages of some sort.

Points go to SDXL 1.0 for respecting the prompt here: The image is more immediately recognizable as a painting, while Midjourney’s feels like a still from a video game.

Winner: SDXL 1.0

Acrylic painting

“Cabin in the woods, acrylic paint”

This is going to be a tie. I may personally prefer the Midjourney image, but there’s nothing at all wrong with the SDXL one.

Winner: Tie

Watercolor painting

“Wildflower meadow, watercolor painting”

Midjourney's and SDXL's take on a wildflower meadow in watercolor — SDXL 1.0 (left) vs. MJ 5.2 (right)

Here we go with the plagiarism again!

Both are lovely, colorful images that follow the prompt and look surprisingly similar.

Winner: Tie

Pencil sketch

“Hovercraft, pencil sketch”

This one’s a bit of a paradox to judge.

The Midjourney 5.2 image is clearly more detailed and well-realized, but that’s precisely what takes away from the “pencil sketch” feel. It’s a bit too overengineered for a quick pencil drawing. SDXL instantly screams “pencil sketch.”

Winner: SDXL 1.0

Charcoal drawing

“Alien fish, charcoal drawing”

I was going to mock Midjourney for the white-on-black image (the vast majority of images it returned were white-on-black), but then I learned that white charcoal drawings are absolutely a thing.

Still, I’m again tempted to give SDXL 1.0 points for simplicity, as Midjourney just does too much with the composition, which ruins the “drawing” vibe.

Winner: SDXL 1.0 (but barely)

Long prompts

For this one, I wanted to see how the two models dealt with complex scene descriptions containing multiple details.

Fun fact: To go full-tilt on AI, I had ChatGPT dream up the prompts for this challenge.

Remember to click an image to see it in full size and the right aspect ratio

“Ruined skyscrapers pierce a blood-red sky. Overgrown vegetation reclaims the streets. In the distance, a lone figure with a makeshift backpack stands atop a crumbled overpass, gazing out.”

SDXL and MJ interpretations of a dystopian sci-fi scene — SDXL 1.0 (left) vs. MJ 5.2 (right)

I like both images quite a bit.

Also, each model succeeds in following most directions, with the notable exception of putting the person “in the distance.”

But Midjourney has the edge here for the more dramatic and striking image and for including the “crumbled overpass” described in the prompt.

Winner: Midjourney 5.2

“Bustling Victorian-era docks with airships floating above. Steam-powered cranes load crates onto wooden ships. Street vendors sell curious gadgets, and cobbled streets are alive with clockwork creatures.”

Steampunk Victorian docks by SDXL and Midjourney — SDXL 1.0 (left) vs. MJ 5.2 (right)

Oof!

Lots of details from the prompt have been overlooked by both models, from the steam-powered cranes to the cobbled streets to the clockwork creatures.

SDXL 1.0 nudges ahead in this one for including an airship and for the more authentic representation of a ship in a dock.

Winner: SDXL 1.0

“Multi-colored tents sprawl across a low-gravity moon. Various extraterrestrial species barter with holographic currency. Stalls display glowing fruits, mysterious relics, and levitating pets.”

Alien bazaar by SDXL and MJ — SDXL 1.0 (left) vs. MJ 5.2 (right)

I’m not seeing extraterrestrials, identifiable mysterious relics, or levitating pets in either image. But I do like the more “complete” look of the Midjourney output. I also feel it captures the intended vibe of the prompt better.

Not the most clearcut win, but it counts.

Winner: Midjourney 5.2 (but barely)

“On a rugged cliff edge, an old stone lighthouse stands tall against a tempestuous sea. Waves crash ferociously at its base, while its beacon cuts a solitary ray of light through the thick fog, guiding unseen ships to safety.”

Lonely lighthouse by SDXL and MJ — SDXL 1.0 (left) vs. MJ 5.2 (right)

There’s that copy-paste action again.

Both models do well with the details.

Stone lighthouse? Check!

Tempestuous sea with ferociously crashing waves? Check!

Ray of light cutting through fog? Basically check!

Ladies and gentlemen, we have another tie.

Winner: Tie

Abstract prompts

This section is more of a just-for-fun digression than a serious test.

I wanted to try a few random and abstract prompts, and since I’m a big boy now, I went ahead and did it without asking for anyone’s permission!

Remember to click an image to see it in full size and the right aspect ratio

Intangible concept

“Infinity”

I mean, come on.

We all agree Midjourney 5.2 kills it, right?

There’s nothing technically wrong with SDXL’s accurate symbol for infinity, but the Midourney image is just incredible and takes us on a wild ride.

Winner: Midjourney 5.2

Emojis

Emoji combos were some of my very first beginner-friendly Midjourney prompt recommendations back in December last year.1 So of course I had to try at least one!

“🌈👽”

Rainbow + alien emoji results in SDXL and MJ — SDXL 1.0 (left) vs. MJ 5.2 (right)

Uh!

I dunno.

The SDXL 1.0 image is very basic but it does kind of have both the rainbow and the otherworldly alien aspects included. Midjourney spit out a grid with four images of women where only one was tangentially related to an alien thanks to her shiny sci-fi outfit:

We’ll call this a draw.

Winner: Tie

Made-up word

For our final challenge, I asked ChatGPT to make up 10 non-existent words…

…and picked one that sounded the most promising for art-generation purposes:

“Flonstrance”

I actually kind of like the abstract nature of the SDXL 1.0 image. Definitely has a “flonstrance” vibe to it, for whatever that’s worth.

There’s no doubt that Midourney is more polished and visually impressive, but I’m going to have to start subtracting points for MJ constantly defaulting to images of women when it doesn’t quite know what to do with a prompt:

To be fair, the one in the top-left looks like a total Flonstrance!

Winner: SDXL 1.0

Observations

Time for some top-level conclusions.

Midjourney 5.2 remains the better overall model for vibrant, sharp, and visually striking pictures. Its output also tends to be more fully realized while SDXL 1.0 typically has more of an unpolished, work-in-progress quality. Finally, Midjourney 5.2 is the clear frontrunner when it comes to photographic and realistic results.

At the same time, SDXL 1.0 is often better at faithfully representing different art mediums. Midjourney images are just a bit too polished and detailed to pass for paintings or drawings. So in certain circumstances, the more minimalistic nature of SDXL 1.0 output serves it well.

Methodology (aka the “boring” part)

To make sure my comparison wasn’t polluted by any under-the-hood tweaks from third-party generators, I went to the source for each model:

The official Discord server for Midjourney
The official Discord server for SDXL 1.0

For SDXL 1.0, I’ve kept the default settings and didn’t apply any special styles.

The point was to let the text prompts themselves do the heavy lifting and see how each model does with a minimum of additional tweaking. I feel this makes for a fair comparison, especially since Stability AI claim the new SDXL 1.0 works with short prompts and no longer requires extra qualifiers to perform well:

Quote from Stability AI blog announcement about SDXL 1.0 better handling simple prompts

I used the same aspect ratio for each one-to-one comparison.

Finally, I always tried to pick the best image of the grid for each model2. Granted, this selection is bound to be subjective, but hey, what are you gonna do?

Over to you…

Do you agree with my opinions about each image and my overall conclusions? Or would you rate the images differently?

Have you had the chance to experiment with the two models and discover other interesting points of comparison that I might have overlooked?

I’d love to hear your thoughts. Leave a comment on the site or shoot me an email.

For the record, I think Midjourney V4 is still the best choice for accurate results based on emoji combinations. The new V5.2 tends to ignore emoji aspects and defaults to images of women.

Midjourney generates four images per prompt while the SDXL bot only generates two. To balance this out, I rolled each prompt twice for SDXL to have four images to choose from.

Liked this post? Help me grow Why Try AI by sharing it with others.

Andrew Smith

Neat! It seems like Midjourney's biggest "value add", if you can call it that, is that everything is beautiful. That's fantastic if you're just trying to create something that looks good, but that can actually get in the way (as you rightly point out) if you're looking for specific stylization.

Expand full comment

8 replies by Daniel Nest and others

8 more comments...

Why Try AI

SDXL 1.0 vs. Midjourney 5.2: How Do They Compare?

A look at how the two current best-in-class text-to-image models fare against each other.

Painting types

Portrait

Landscape

Still life

Photographic images

Vintage photo

Macro photography

Street photography

Nature photography

Fashion photography

Art mediums

Oil painting

Acrylic painting

Watercolor painting

Pencil sketch

Charcoal drawing

Long prompts

Abstract prompts

Intangible concept

Emojis

Made-up word

Observations

Methodology (aka the “boring” part)

Over to you…

Discussion about this post