Sunday Rundown #94: Agents, Video & Platypus Lets Loose
Sunday Bonus #54: My swipe file with 90+ use cases for GPT-4o image generation.
Happy Sunday, friends!
Welcome back to the weekly look at generative AI that covers the following:
Sunday Rundown (free): this week’s AI news + a fun AI fail.
Sunday Bonus (paid): an exclusive segment for my paid subscribers.
Let’s get to it.
🗞️ AI news
Here are this week’s AI developments.
👩💻 AI releases
New stuff you can try right now:
Adobe added Generative Extend to Premiere Pro, letting creators and editors automatically fill in audio or video gaps using AI.
Amazon released a model called Nova Act in research preview, which lets developers build browser-based agents for complex workflows.
Anthropic launched a specialized Claude for Education that uses approaches like Socratic questioning to improve learning outcomes.
Apple expanded Apple Intelligence features to new languages and regions.
Genspark now has a “Super Agent” that pulls together the rest of Genspark’s tools and specialized agents to reliably perform complex tasks.
Google news:
Google Slides now incorporates the Imagen 3 model for image generation along with other visual tools.
NotebookLM added a “Discover Sources” feature that lets you find and add quality sources directly inside your notebook. (You might now be able to skip Step #2 of this recent guide of mine.)
Gemini 2.5 Pro is currently available to free accounts at gemini.google.com.
Higgsfield AI introduced a video model called DoP I2V-01-preview. Its claim to fame? Dynamic and “fun” camera angles and movements via preset templates.
Krea AI is shipping:
3D tool lets you generate 3D objects using leading models from third-party providers in one place.
Gemini Image Editing can edit images based on your natural language directions in Krea Chat.
Video Re-style applies a different style to a video while keeping the scene composition and movements consistent.
Lindy announced Agent Swarms, letting its AI agents duplicate themselves to perform tasks in parallel. (Yes, Agent Smith is also where my mind went.)
Luma AI introduced Camera Motion Concepts that let you precisely define how the camera moves using presets. (Did they coordinate with Higgsfield?)
Midjourney finally launched V7 Alpha, which comes with a “Draft” mode that creates images rapidly on the fly as you give it creative direction using voice:
Manus AI launched premium subscriptions and improved its agent with longer context, better multimodal capabilities, and more.
MiniMax has a new, ultrarealistic text-to-speech model called Speech-02 that can generate lifelike voices in over 30 languages.
Runway unleashed the much-awaited Gen-4 video model that boasts much better fidelity, improved dynamic motion, and gives filmmakers more control.
Tencent upgraded its Hunyuan-T1 deep reasoning model with better coding skills, improved writing quality, multi-turn text comprehension, and more.
Windsurf added many nifty features in its Wave 6 update.
🔬 AI research
Cool stuff you might get to try one day:
Alibaba is rumored to soon release Qwen 3, an upgraded version of its primary model, to compete with DeepSeek and OpenAI.
ByteDance introduced DreamActor-M1, a framework that can realistically animate a single static image using a driving reference video.
Meta introduced MoCha, which uses only speech and text as inputs to generate a fully realized talking character based on them.
OpenAI has some plans:
Gearing up to release its first open-weight LLM since GPT-2.
Is rumored to be adding a thinking slider for ChatGPT that lets users adjust the amount of reasoning a model uses.
Expecting to release o3 and o4-mini in the coming weeks, followed by GPT-5 later on.
📖 AI resources
Helpful AI tools and stuff that teaches you about AI:
“Reasoning models don't always say what they think” [research] - curious insights from Anthropic’s alignment research.
“AI 2027” [website] - a speculative project that forecasts the trajectory of AI progress, predicting superhuman AI by 2027.
🤦♂️ AI fail of the week
Asked Sora to animate my recent steampunk platypus. Things escalated quickly:
💰 Sunday Bonus #54: 90+ use cases for GPT-4o image generation (swipe file)
I’m really excited about this one!
Native image generation in GPT-4o is a massive paradigm shift—yet somehow, most of us got stuck at the “my face, but Studio Ghibli” stage.
Shame!
To fix that, I consulted a bunch of articles and videos on GPT-4o image generation and its many applications, from funny to practical.
After lots of back-and-forth with Genspark’s “Super Agent” (see above), I now have a clean, interactive swipe file that features 90+ use cases for your inspiration:
You can filter the list by category, search by keyword, and one-click copy the template prompts to try them yourself.
I’m genuinely proud of how this one turned out.
Enjoy!