3 Free Sites to Compare LLMs

Daniel Nest

Sep 19, 2024

Pit LLMs against each other and compare their output.

Read →

14 Comments

Ben Whitman

Sep 22, 2024

Appreciate the shout out Daniel 🙏

Expand full comment

Reply (1)

Daniel Nest

Sep 22, 2024

You got it, Ben!

Modelbench feels like a really polished product. I hope you're having good traction with it and getting positive feedback from developers and pro users.

Expand full comment

Andrew Smith

Sep 19, 2024

Super useful for folks shopping around!

My own 2 cents: these tests are necessarily very limited for me since I rarely just prompt once, but that has a lot to do with the way I use LLLMs - almost exclusively for research. If I'm learning about something, I have little idea which are the right questions to ask in that moment.

Expand full comment

Reply (1)

Daniel Nest

Sep 19, 2024

Sounds like Modelbench is the choice for you then - you can keep chatting to each model separately in the comparison view, ask follow-up questions, etc.

Expand full comment

Reply (1)

Andrew Smith

Sep 19, 2024

I've done this before, but it gets VERY tedious after that first branch. It starts out very similarly, but it's like when a new universe is formed whenever any event happens due to Shroedinger's cat and all that. I hear what you're saying, but I'm going to drop a historical video that summarizes my present view: https://youtu.be/zGxwbhkDjZM

Expand full comment

Reply (1)

Daniel Nest

Sep 19, 2024

I mean, usually after the first few interactions it becomes clear which model has the right combination of accuracy, tone, verbosity, etc. for your needs. I think these comparisons are best when you have a few specific models to evaluate. If you already have a tool that does what you need, you're probably good!

Expand full comment

Reply (1)

Andrew Smith

Sep 19, 2024

I think that's the thing- it's having 2 particular models in mind to test out. Now that you mention it, this could be more useful since the previous iterations were just 2 side-by-side models.

The down side is that I really have to dive in and get my hands dirty to get a feel for the more nuanced stuff. Accuracy and speed matter a lot, but the way the info is presented is important too, and I often don't know which way works better until I've used the platform for like a month.

Still, you've got me intrigued now and I might be willing to be an LLM guinea pig once again.

Expand full comment

Andrew Sniderman 🕷️

Sep 19, 2024

It’s definitely the latter and its copilot/GTP4. Am I in an AI rut? I used chatGPT on and off but found copilot results better/more up to date. I’ve gotten comfy with the integrated image generation. While copilot draws I zone out looking at the swirly colors and once it’s done resizing and changing styles is easy. I can usually get an image I like in 10 minutes now. I used to play with diff models/tools a lot more but it feels like there is rough parity in the current gen of models rn.

Expand full comment

Reply (1)

Daniel Nest

Sep 19, 2024

To be fair, as long as your tool is doing what you need it to, I think you're good!

And it's cool how Microsoft Copilot basically lets you use OpenAI's DALL-E 3 for free to make images while ChatGPT itself is limited to 2 images per several hours or so. What do you tend to use the images for?

Expand full comment

Reply (1)

Andrew Sniderman 🕷️

Sep 19, 2024

I make one weekly for every article I post. It’s almost as much fun as the writing. At first the styles were wildly inconsistent because I was playing with all the diff styles and going deep into prompts but I wanted a more consistent look for my pub so now I go Low Poly so it’s obviously AI generated and I like the look. It also makes for simpler images that show better as thumbnails (imho). Copilot generates 4 images usually so I pick the best and you get 15 free generations a day and the suggested text modifiers are also really helpful. The UX was a little spotty at first but now it’s really consistent across web apps. Here’s one for an AI faceoff

https://copilot.microsoft.com/images/create/a-head-to-head-ai-face-off/1-66ec4cd7b9134837a8142968dd212eb5?id=rNrF0OZRrjSFtqpG%2fkMwzQ%3d%3d&view=detailv2&idpp=genimg&idpclose=1&thId=OIG3.xcJ4jk9UwMpxHFFOqFyw&lng=en-US&ineditshare=1

Expand full comment

Reply (1)

Daniel Nest

Sep 19, 2024

Yeah that's a cool approach - sticking to one specific style/theme for your images. Especially when you find something that AI can consistently replicate.

Expand full comment

Kyle

Jul 15

These are a lie. They are require payments and don't let you do anything that isn't short prompts. They are geared towards business users that just do short coding prompts,etc. WordAI doesn't even work at all saying 'Models offline will email you when they are on' Since July 11th. Yeah right...............

Expand full comment

Reply (1)

Daniel Nest

Jul 15

Hey Kyle, you're reading an article from September 2024, so it's entirely possible that the conditions have changed since I first wrote it.

Expand full comment

Kyle

Jul 15

These are outright lies. They have all trimed or severely hampered regular users and have focused on business users instead IE: Coding or short prompts even if you pay Word Ware has said since July 11th 'Our Models are offline we will email you when they are on*. All the AI sites are acting like scared chickens over Trump.

Expand full comment

Why Try AI

3 Free Sites to Compare LLMs