11 Comments

Appreciate the shout out Daniel 🙏

Expand full comment

You got it, Ben!

Modelbench feels like a really polished product. I hope you're having good traction with it and getting positive feedback from developers and pro users.

Expand full comment

Super useful for folks shopping around!

My own 2 cents: these tests are necessarily very limited for me since I rarely just prompt once, but that has a lot to do with the way I use LLLMs - almost exclusively for research. If I'm learning about something, I have little idea which are the right questions to ask in that moment.

Expand full comment

Sounds like Modelbench is the choice for you then - you can keep chatting to each model separately in the comparison view, ask follow-up questions, etc.

Expand full comment

I've done this before, but it gets VERY tedious after that first branch. It starts out very similarly, but it's like when a new universe is formed whenever any event happens due to Shroedinger's cat and all that. I hear what you're saying, but I'm going to drop a historical video that summarizes my present view: https://youtu.be/zGxwbhkDjZM

Expand full comment

I mean, usually after the first few interactions it becomes clear which model has the right combination of accuracy, tone, verbosity, etc. for your needs. I think these comparisons are best when you have a few specific models to evaluate. If you already have a tool that does what you need, you're probably good!

Expand full comment

I think that's the thing- it's having 2 particular models in mind to test out. Now that you mention it, this could be more useful since the previous iterations were just 2 side-by-side models.

The down side is that I really have to dive in and get my hands dirty to get a feel for the more nuanced stuff. Accuracy and speed matter a lot, but the way the info is presented is important too, and I often don't know which way works better until I've used the platform for like a month.

Still, you've got me intrigued now and I might be willing to be an LLM guinea pig once again.

Expand full comment

It’s definitely the latter and its copilot/GTP4. Am I in an AI rut? I used chatGPT on and off but found copilot results better/more up to date. I’ve gotten comfy with the integrated image generation. While copilot draws I zone out looking at the swirly colors and once it’s done resizing and changing styles is easy. I can usually get an image I like in 10 minutes now. I used to play with diff models/tools a lot more but it feels like there is rough parity in the current gen of models rn.

Expand full comment

To be fair, as long as your tool is doing what you need it to, I think you're good!

And it's cool how Microsoft Copilot basically lets you use OpenAI's DALL-E 3 for free to make images while ChatGPT itself is limited to 2 images per several hours or so. What do you tend to use the images for?

Expand full comment

I make one weekly for every article I post. It’s almost as much fun as the writing. At first the styles were wildly inconsistent because I was playing with all the diff styles and going deep into prompts but I wanted a more consistent look for my pub so now I go Low Poly so it’s obviously AI generated and I like the look. It also makes for simpler images that show better as thumbnails (imho). Copilot generates 4 images usually so I pick the best and you get 15 free generations a day and the suggested text modifiers are also really helpful. The UX was a little spotty at first but now it’s really consistent across web apps. Here’s one for an AI faceoff

https://copilot.microsoft.com/images/create/a-head-to-head-ai-face-off/1-66ec4cd7b9134837a8142968dd212eb5?id=rNrF0OZRrjSFtqpG%2fkMwzQ%3d%3d&view=detailv2&idpp=genimg&idpclose=1&thId=OIG3.xcJ4jk9UwMpxHFFOqFyw&lng=en-US&ineditshare=1

Expand full comment

Yeah that's a cool approach - sticking to one specific style/theme for your images. Especially when you find something that AI can consistently replicate.

Expand full comment