really good instructions on how to access google studio. strange that it does not get talked about more because i find it to be a pretty neat playground.
here are some of the pictures i got. basically showing my cat outside my apartment balcony and trying to capture her regal-ness as a queen while she observes the human peasants below lol
Yeah it's crazy how much you can do for free in the AI Studio - multimodal images, video analysis, realtime streaming with Gemini chat, sharing the screen, access to practically every LLM from Google, etc. It's a treasure trove!
Those kitty pictures are epic! I love that you can upload and modify your own images as well, really neat.
Yes! I actually forgot to mention that. You can e.g. upload an image of yourself and add accessories, change facial expressions, etc. It's great. The output isn't always 100% polished and it might take a few tries, but the fact that it can even do this at all using text commands is huge!
Nice, I just now switched my experimental model over so I can play around this week.
This reminds me of voice, too - if it has to switch modes by converting everything to text, it doesn't work nearly as well as a native generator could. Seems like this is a big key to gen AI success.
Yeah, Gemini's voice implementation for now still uses the old speech-to-text/text-to-speech hack, which is why Advanced Voice Mode in ChatGPT is better.
But between the two of them, they now cover the entire multimodality range with native image generation and native speech.
I'm reasonably sure they'll both end up as true omnimodal models soon.
Let me know if you discover any fun use cases for the natively multimodal Gemini 2.0 Flash.
really good instructions on how to access google studio. strange that it does not get talked about more because i find it to be a pretty neat playground.
here are some of the pictures i got. basically showing my cat outside my apartment balcony and trying to capture her regal-ness as a queen while she observes the human peasants below lol
https://imgur.com/a/zcXAY18
Yeah it's crazy how much you can do for free in the AI Studio - multimodal images, video analysis, realtime streaming with Gemini chat, sharing the screen, access to practically every LLM from Google, etc. It's a treasure trove!
Those kitty pictures are epic! I love that you can upload and modify your own images as well, really neat.
GAAAAA MUST TRY! Can you give it a starter image to manipulate?
Yes! I actually forgot to mention that. You can e.g. upload an image of yourself and add accessories, change facial expressions, etc. It's great. The output isn't always 100% polished and it might take a few tries, but the fact that it can even do this at all using text commands is huge!
Nice, I just now switched my experimental model over so I can play around this week.
This reminds me of voice, too - if it has to switch modes by converting everything to text, it doesn't work nearly as well as a native generator could. Seems like this is a big key to gen AI success.
Yeah, Gemini's voice implementation for now still uses the old speech-to-text/text-to-speech hack, which is why Advanced Voice Mode in ChatGPT is better.
But between the two of them, they now cover the entire multimodality range with native image generation and native speech.
I'm reasonably sure they'll both end up as true omnimodal models soon.
Let me know if you discover any fun use cases for the natively multimodal Gemini 2.0 Flash.
Will do. Gemini and ChatGPT are pretty easy for me to experiment with since I use both every day. Happy to explore when possible!