Sunday Rundown #58: Runway Gen-3 for Everyone & a Talking Boulder
Sunday Bonus #18: Using Gemini's video capabilities for meeting insights.
Happy Sunday, friends!
Welcome back to the weekly look at generative AI that covers the following:
Sunday Rundown (free): I share this week’s AI news and an AI fail for your entertainment.
Sunday Bonus (paid): My paid subscribers get a goodie in the form of a guide, an AI tip, a tool walkthrough, etc.
Let’s get to it.
🗞️ AI news
Here are this week’s AI developments.
👩💻 AI releases
New stuff you can try right now:
Runway made its latest Gen-3 Alpha text-to-video model available to everyone. (You’ll need at least a Pro account to use it though.)
Suno text-to-music now has an iOS app.
Perplexity introduced a more powerful Pro Search that breaks research down into several steps and analyzes data before spitting out an answer.
French AI lab Kyutai released a voice chat model called Moshi, which responds instantly in a way demoed by OpenAI in May.
ElevenLabs has dropped a Voice Isolator tool that can extract vocals from a recording while removing background noise.
🔬 AI research
Cool stuff you might get to try one day:
Meta AI is working on a state-of-the-art text-to-3D model called Meta 3D Gen, which generates better-quality 3D assets than existing models, faster.
📖 AI resources
Helpful stuff that teaches you about AI:
“How Far Can We Scale AI? Gen 3, Claude 3.5 Sonnet and AI Hype” (Video) - great summary of different perspectives on the limits of scaling by AI Explained.
From my sponsor:
Explore SciSpace: an AI platform for researchers. Browse 280M+ papers, conduct literature reviews, chat with PDFs, and get AI-powered summaries.
Use code WTAI40 for 40% off an annual subscription or WTAI20 for 20% off a monthly subscription.
🔀 AI random
Other notable AI stories of the week:
Meta will start being more rigorous in labeling AI-generated content on its platforms.
Anthropic is looking to fund the development of third-party evaluations of AI models, especially in areas of safety and capability.
ElevenLabs has partnered with the estates of iconic stars of the past to incorporate their voices into its Reader App.
🤦♂️ 10. AI fail of the week
Talk about being stuck between a rock and a hard place. (Final version.)
Anything to share?
Sadly, Substack doesn’t allow free subscribers to comment on posts with paid sections, but I am always open to your feedback. You can message me here:
💰 Sunday Bonus #18: Extract insights from video meetings with Gemini (for free)
In the past year, we’ve seen an absolute avalanche of paid AI note-takers for online meetings.
And yet, as it turns out, free Gemini 1.5 Pro is exceptionally well-suited for this purpose.
It’s the only free LLM that can parse combined video + audio input out of the box. It is freakishly good at it. This means Gemini can pick up visual cues along with what’s being said to provide a holistic analysis of any meeting.
Its 2M-token context window means it can easily handle multi-hour meetings.
Unlike simple AI note-takers and summarizers, Gemini can also recommend follow-up steps, identify potential roadblocks and how to deal with them, spot points of disagreement and suggest compromise solutions, and much more.
In today’s guide, I’ll share:
The step-by-step process of getting Gemini to analyze a recorded meeting.
My 200-word “starter” prompt for extracting structured feedback from Gemini.
Ideas for other ways you can use Gemini in this context.
This is the most excited I’ve been to share a goodie with my paid subscribers.
Let’s get to it!