Sunday Rundown #90: Video Vibes & Terror Tomato
Sunday Bonus #50: Recording of my live Adapta Q&A.
Happy Sunday, friends!
Welcome back to the weekly look at generative AI that covers the following:
📰 Sunday Rundown (free): this week’s AI news + a fun AI fail.
💎 Sunday Bonus (paid): an exclusive segment for my paid subscribers.
🎉Today is its gold anniversary with 50 entries. Here’s to another 50 to come.🎉
Let’s get to it.
🗞️ AI news
Here are this week’s AI developments.
👩💻 AI releases
New stuff you can try right now:
Alibaba released QwQ-32B, a reasoning model that performs on par with DeepSeek-R1 despite being over 20 times smaller. (Try it here.)
Anthropic made several improvements to its Anthropic Console, letting developers test, iterate, and ship faster.
Codeium released almost a dozen “Wave 4” capabilities in Windsurf.
Cohere’s Aya Vision is a new SOTA multimodal and multilingual model that excels in image captioning, visual question answering, text generation, and more
Convergence AI launched Template Hub - a repository of workflow-specific agents (templates) for its AI-powered assistant Proxy.
DuckDuckGo launched Duck.ai which offers private access to several small language models. (Try it here.)
ElevenLabs integrated Claude 3.7 Sonnet into its Conversational AI platform, giving voice agents better reasoning and conversational powers.
Google news:
Google Colab now has a Data Science Agent that can automatically set up complete, working notebooks via simple text prompts.
New AI-powered features for Google Shopping let users do virtual try-ons and match vague ides to real-world products.
Gemini in Google Calendar can create new events, look up event details, and help check your schedule. (Apply for Google Workspace Labs to try.)
Hedra launched the Character-3 model that reasons across image, text, and audio and allows text-to-video and audio-to-video generation in Hedra Studio
LTX Studio’s new open-source video model—LTXV 0.9.5—offers keyframe conditioning and can generate longer videos with better quality and resolution.
Luma Labs enhanced its Ray2 model with cool new features that enable keyframe transitions, video extensions, and video looping.
Mistral AI now has a top-tier OCR model called Mistral OCR that leads multiple accuracy benchmarks across many languages. (Try it for free on Le Chat.)
Manus is an impressive new general AI agent that excels at real-world tasks and convincingly outperforms OpenAI’s Deep Research on the GAIA benchmark.
OpenAI news:
GPT-4.5 has now rolled out to all ChatGPT paid accounts.
ChatGPT for macOS can now edit code directly in IDEs.
Sesame has an uncanny AI voice model that enables natural-sounding, humanlike conversations. (Try the demo for yourself)
Tavus introduced Phoenix-3, Raven-0, and Sparrow-0, a new AI model family that powers its lifelike Conversational Video Interfaces (CVI). (Try the free demo.)
Tencent open-sourced a new HunyuanVideo I2V image-to-video model.
🔬 AI research
Cool stuff you might get to try one day:
Amazon is developing a reasoning model in its Nova family that might utilize a hybrid architecture similar to Claude 3.7 Sonnet.
ASLP Lab introduced DiffRhythm, a latent diffusion-based model that can create full-length songs in just ten seconds.
Google research:
Users will soon be able to share their screens and video feeds when asking questions and interacting with Gemini Live.
AI Mode will be an AI-first approach to searching the web similar to Perplexity. (Join the waitlist in Search Labs.)
Captions introduced Mirage, an audio-to-video model that creates talking videos of non-existent humans purely from audio input. (Apply for access.)
Meta researchers have a “large reconstruction model (LRM)” that can create animated, realistic avatar heads from just four selfie images.
OpenAI plans to offer specialized AI agents capable of PhD-level research, knowledge work, and software development, priced at up to $20,000 / month.
Opera teased Browser Operator, an AI agent that independently performs tasks directly within its browser.
📖 AI resources
Helpful AI tools and stuff that teaches you about AI:
“Creative AI Superpowers You’re Not Using Yet” [VIDEO] - info-packed 40-minute look at different things you can do with AI tools (many of them free).
“Dario Amodei of Anthropic’s Hopes and Fears for the Future of A.I.” [VIDEO] - interview by Hard Fork.
“Expressive TTS Arena” [SITE] - compare different text-to-speech models in terms of how expressive and lifelike they are.
🔀 AI random
Other notable AI stories of the week:
Google co-founder Larry Page is reportedly working on Dynatomics, an AI startup that applies AI to product manufacturing.
Perplexity partnered with Deutsche Telekom to bring its Perplexity Assistant to the new AI Phone.
Stability AI partnered with Arm to deliver on-device generative audio to mobile devices.
🤦♂️ AI fail of the week
This nightmare fuel isn’t at all what I was going for. (Final version used in this post.)
💰 Sunday Bonus #50: My live Q&A showcasing AI tools and workflows
It’s a good week to be a paid Why Try AI subscriber.
If you only receive my Sunday posts, you might’ve missed the Thursday video guide for paid readers on mastering new topics with AI.
But there’s more!
I recently held a live Q&A for Adapta‘s community members, where I answered pre-submitted questions from the audience by showcasing lots of AI tools and workflows. Here are the questions I covered:
“With the accelerated advance of AI, new releases of LLMs as we've had 2 from China lately, having paid access to several LLMs is unfeasible. For an independent user, which paid LLMs would you recommend?”
“My learning style is synaesthetic; I learn more easily by doing in practice.
With this in mind, where can I learn to prepare a service chatbot step by step?
It can be the simplest chatbot, but I can follow it step by step until I see the system working.”
“I'd like to understand [Adapta] "Flows" and learn how to develop it to use for my biz”
“How do I create presentations in PowerPoint using AI when I have all the content and just need AI to format all the slides in a way that is suitable for presentation?”
“I would like some practical examples of how I can use AI instead of Excel, or examples showing how I can use AI to be more efficient when I need to create a sheet in Excel.”
“Which is the most important thing for new users to learn, know, and focus to understand and wisely use AI in Healthcare?”
“What are the best tools and processes for a sales area to increase the generation of high-ticket leads (projects with a value above $500k)?”
“How can I make an SDR AI agent less generic?”
“What is the best form of automation for WhatsApp with artificial intelligence?”
“I would like to know how I can have a generative AI that serves my customers like a human being and can also listen and respond with audio. This AI would have to respond, qualify (if they want to purchase a product), and schedule strategic calls, in the case of high-tech mentoring.”