Mar 13

Image quality is a red herring. We're finally witnessing true multimodality.

7 Comments

really good instructions on how to access google studio. strange that it does not get talked about more because i find it to be a pretty neat playground.

here are some of the pictures i got. basically showing my cat outside my apartment balcony and trying to capture her regal-ness as a queen while she observes the human peasants below lol

https://imgur.com/a/zcXAY18

Expand full comment

Reply (1)

Daniel Nest

Mar 13

Yeah it's crazy how much you can do for free in the AI Studio - multimodal images, video analysis, realtime streaming with Gemini chat, sharing the screen, access to practically every LLM from Google, etc. It's a treasure trove!

Those kitty pictures are epic! I love that you can upload and modify your own images as well, really neat.

Expand full comment

Andrew Sniderman 🕷️

Mar 13

GAAAAA MUST TRY! Can you give it a starter image to manipulate?

Expand full comment

Reply (1)

Daniel Nest

Mar 13

Yes! I actually forgot to mention that. You can e.g. upload an image of yourself and add accessories, change facial expressions, etc. It's great. The output isn't always 100% polished and it might take a few tries, but the fact that it can even do this at all using text commands is huge!

Expand full comment

Andrew Smith

Mar 13

Nice, I just now switched my experimental model over so I can play around this week.

This reminds me of voice, too - if it has to switch modes by converting everything to text, it doesn't work nearly as well as a native generator could. Seems like this is a big key to gen AI success.

Expand full comment

Reply (1)

Daniel Nest

Mar 13

Yeah, Gemini's voice implementation for now still uses the old speech-to-text/text-to-speech hack, which is why Advanced Voice Mode in ChatGPT is better.

But between the two of them, they now cover the entire multimodality range with native image generation and native speech.

I'm reasonably sure they'll both end up as true omnimodal models soon.

Let me know if you discover any fun use cases for the natively multimodal Gemini 2.0 Flash.

Expand full comment

Reply (1)

Andrew Smith

Mar 13

Will do. Gemini and ChatGPT are pretty easy for me to experiment with since I use both every day. Happy to explore when possible!

Expand full comment

Why Try AI

Gemini 2.0 Flash Makes Mediocre Images...But…