Sunday Rundown #78: Audio-Video & Wonder Lizard

Sunday Bonus #38: My speech-to-image tool that turns scene descriptions into pictures

Daniel Nest

Dec 01, 2024

∙ Paid

Happy Sunday, friends!

Welcome back to the weekly look at generative AI that covers the following:

Sunday Rundown (free): this week’s AI news + a fun AI fail.
Sunday Bonus (paid): a goodie for my paid subscribers.

All Sunday Bonuses In One Place

Let’s get to it.

🗞️ AI news

Here are this week’s AI developments.

👩‍💻 AI releases

New stuff you can try right now:

Alibaba released a reasoning model called QwQ-32B-Preview to rival OpenAI’s o1-preview, just one week after DeepSeek did the same with R1 Lite. (Try the demo here.)
Anthropic has been busy again:
1. It introduced a new Model Context Protocol (MCP) - “a universal, open standard for connecting AI systems with data sources.”
2. You can now create custom “Styles” in Claude by uploading samples of your writing that it can mimic (you can also pick from a few basic style presets).
ElevenLabs launched GenFM, a text-to-podcast tool similar to “Audio Overviews” in NotebookLM, but it lets you select different voices and languages.
H Company launched Runner H, an AI agent that outperforms competitors in real-world applications and can handle a wider range of tasks.
Hume connected its voice interface with Anthropic’s “Computer Use,” letting you control your computer using spoken instructions.
Lightricks open-sourced its LTXV video model capable of fast, high-quality video generation. (Try the Hugging Face demo.)
Luma expanded its Dream Machine video model into a full-fledged “creative platform” with new features and an iOS app. (Try it here.)
Stability AI has enabled ControlNet tools for its latest Stable Diffusion 3.5 Large model. (Here’s more about ControlNet.)

🔬 AI research

Cool stuff you might get to try one day:

Amazon is reportedly working on an AI model code-named “Olympus” that can understand complex scenes in images or videos.
NVIDIA showcased its impressive sound model called Fugatto which accepts text and audio inputs and is capable of creating any combination of sounds, music, and voices.
Runway is gradually rolling out its text-to-image tool “Frames,” which gives creators precise control over style and visual direction.

📖 AI resources

Helpful AI tools and stuff that teaches you about AI:

“7 examples of Gemini’s multimodal capabilities” - real-world cases compiled by Google.
“GenChess” [tool] - a fun Google Labs space that lets you create virtual new chess sets based on any object or theme and then play a game with them.

🔀 AI random

🤦‍♂️ AI fail of the week

I mean, I did ask for a “caricature,” but this is very much not it.

💰 Sunday Bonus #38: Turn a vague, spoken scene description into an image

I love messing around with AI image tools.

In fact, that’s what got me to start this newsletter in the first place.

I’m also a huge proponent of less-is-more image prompting, as seen here:

Bye, Splatterprompting. We Hardly Knew You.

Daniel Nest

January 20, 2023

Read full story

Want Better AI Images? Ask a Chatbot!

Daniel Nest

January 18, 2024

Read full story

Minimum Viable Prompt: Your Cure for AI Overwhelm

Daniel Nest

February 1, 2024

Read full story

But many people are still hesitant to try prompting image models.

They might only have a vague idea of what they want. Or they’re not sure how to put their idea into words and what terms to use. Or they can’t find a way to condense their idea into a short, precise image prompt.

So I went ahead and built a free-to-run tool that works like this:

You turn on your mic and ramble on about the scene you’re thinking of. (Don’t worry about repeating yourself, being too wordy, vague, etc.)
The tool converts that audio input into a clean, precise image prompt.
It then turns that short prompt into a widescreen (16:9) image using the latest and greatest FLUX 1.1 Pro [Ultra] model.

You can also upload a pre-recorded scene description instead of recording it directly.

Check it out:

Why Try AI

Sunday Rundown #78: Audio-Video & Wonder Lizard

Sunday Bonus #38: My speech-to-image tool that turns scene descriptions into pictures

🗞️ AI news

👩‍💻 AI releases

🔬 AI research

📖 AI resources

🔀 AI random

🤦‍♂️ AI fail of the week

💰 Sunday Bonus #38: Turn a vague, spoken scene description into an image

Bye, Splatterprompting. We Hardly Knew You.

Want Better AI Images? Ask a Chatbot!

Minimum Viable Prompt: Your Cure for AI Overwhelm

This post is for paid subscribers