Feb 11, 2024

PLUS: Stable Video Diffusion 1.1, Hugging Face assistants, MetaVoice-1B, Smaug-72B, BUD-E voice assistant, and background removal by Bria AI.

18 Comments

Andrew Smith

Feb 11, 2024

I just now upgraded to Gemini Advanced! The "dumb" version was already pretty useful. Really looking forward to this.

Where did you guys travel?

Expand full comment

Reply (1)

Daniel Nest

Feb 11, 2024Edited

Would love to hear your thoughts!

We're in the Czech Republic. Nathan (my oldest) has a hockey tournament here, and then we're sticking around for the winter holidays with my wife's family.

Expand full comment

Reply (1)

Andrew Smith

Feb 11, 2024

I think I'd really enjoy visiting Prague. I'll let you know when I'm on my way! Thanks for the invite.

Expand full comment

Reply (1)

Daniel Nest

Feb 11, 2024

Don't mention it!

Can't believe you're going to bring me that autographed Eminem CD!

Expand full comment

Reply (1)

Andrew Smith

Feb 11, 2024

100%. I never did tell you who autographed it, did I?

Expand full comment

Reply (1)

Daniel Nest

Feb 11, 2024

I counted on that joke!

Expand full comment

Reply (1)

Andrew Smith

Feb 11, 2024

It was Vanilla Ice.

Expand full comment

Reply (1)

Applied Intelligence

Feb 11, 2024

Bud-E is definitely interesting - I have seen many attempts at this, so this will happen due to the massive interest.

I think Bud-E looks as one of the more "legit" ones. Excellent find Daniel.

I'd be interested to see a side-by-side comparison of Microsoft's Copilot and Midjourney or Dalle - take your squirrel example and do it side by side. While the results will be interesting, I'd like to know which one works faster/easier with your concept of Minimal Viable Prompt (great concept btw)

I wonder what people think of Google's Gemini to use it long term - unfortunately hearing that they "killed" Bard and now they are doing Gemini, this looks on par with Google's business practices where they will pull the rug from under large infrastructure type projects.

Expand full comment

Reply (1)

Daniel Nest

Feb 11, 2024

You're in luck, I've done a side-by-side deep dive into image models last December, here:

https://www.whytryai.com/p/text-to-image-ai-models

(Microsoft Copilot uses DALL-E 3, so it's the same as ChatGPT Plus)

Of course, Imagen 2 and Midjourney V6 came out after that.

The short answer is that DALL-E 3 and Midjourney V6 are the best for the "minimum viable prompt" approach as their prompt understanding and adherence is especially great. But most models are solid these days.

Also, the change from Bard to Gemini is purely a branding exercise. They simply renamed the search chatbot from Bard to Gemini. Bard already used Gemini Pro under the hood since late December, and I guess it just made sense for them to keep the "Gemini" umbrella as their guiding star for the future.

Expand full comment

Phil Tanny

Feb 11, 2024

Hi Daniel,

Bud-E is interesting. This seems another step on the road to digital friends. What's needed next is that the audio output needs to animate a human face image. That's long been possible, but not in real time over the Net, to my limited knowledge.

Expand full comment

Reply (1)

Daniel Nest

Feb 11, 2024

Yeah digital avatars have been around for a while (Synthesia, HeyGen, etc.) - but like you, I'm not sure how well they handle realtime interactions with low latency.

For full immersion, I'd like to see a voice with more inflection and maybe even some filler words like uhm, etc. The video demo voice sounds a bit too monotonous.

Expand full comment