Sunday Rundown #68: AI Video Everywhere & a Head in a Fridge
Sunday Bonus #28: Steering the "Audio Overviews" in NotebookLM
Happy Sunday, friends!
Welcome back to the weekly look at generative AI that covers the following:
Sunday Rundown (free): this week’s AI news + a fun AI fail.
Sunday Bonus (paid): a goodie for my paid subscribers.
Let’s get to it.
🗞️ AI news
Here are this week’s AI developments.
👩💻 AI releases
New stuff you can try right now:
Alibaba Cloud unleashed 100 open-source models including Qwen2.5-Max, which is on par with or better than Llama 3.1 405B and GPT-4o on many LLM benchmarks, making it the best open-source model out there.
Amazon released an AI video generator for sellers to use in ad creation and Project Amelia: an all-in-one AI assistant that helps sellers with stats, answers, and suggestions.
Kling AI launched version 1.5 of its video model and a “Motion Brush” tool that lets you better control the action.
Luma Labs released a Dream Machine API to let developers build products using its video model.
Microsoft is rolling out what it calls “the next wave” of Microsoft 365 Copilot with lots of business-oriented AI features and new tools coming to its suite of products.
Snapchat is bringing text-to-video to a subset of creators as a beta test. Later, the company has plans to make image-to-video available as well. (I tested 9 image-to-video tools not so long ago.)
Suno now lets you exclude specific styles, instruments, and vocals from generated songs. (Kind of like negative prompts in Midjourney and other image tools.)
🔬 AI research
Cool stuff you might get to try one day:
Runway is slowly starting to roll out access to its Gen-3 Alpha Turbo API to make it easier for developers to integrate the video model into their products.
YouTube is planning to roll out more AI features for creators, including its video model Veo for generating B-roll footage and an AI-powered “brainstorming buddy” that helps you generate video ideas.
📖 AI resources
Helpful stuff that teaches you about AI:
Building OpenAI o1 (Extended Cut) [VIDEO] - a chat with the OpenAI team behind the o1 reasoning model with many curious insights.
🔀 AI random
Other notable AI stories of the week:
Runway announced a partnership with Lionsgate to create a custom AI model based on Lionsgate’s proprietary catalog.
🤦♂️ AI fail of the week
“Ah, human heads! Classic Earth delicacy!” (Final version here.)
Anything to share?
Sadly, Substack doesn’t allow free subscribers to comment on posts with paid sections, but I am always open to your feedback. You can message me here:
💰 Sunday Bonus #28: How to steer the “Audio Overviews” in NotebookLM
I mentioned the new “Audio Overviews” feature in NotebookLM in last week’s issue. It automatically generates a short podcast out of any source material you upload.
But that mention didn’t do justice to just how impressive these AI podcasts are.
We’re not talking about a monotone robotic voice giving you a dry summary. These overviews genuinely feel like natural conversations.
The two AI speakers crack jokes, laugh, interrupt, and riff on each other’s statements, stop to catch a breath, occasionally stumble over their words, and so on.
As a taster, here’s a snippet of an audio overview made from the DALL-E 3 research paper. (Especially the first 15 seconds and the part after the 1:15 mark.)
As it stands, you have no control over these Audio Overviews beyond uploading your sources and clicking the “Generate” button. The podcast will always have the same two speakers, and the way they tackle the topic is entirely up to NotebookLM.
But after some testing, I found a semi-reliable “hack” to nudge the podcast in the direction you want in terms of structure, topics covered, etc.
Let me show you.