Sunday Rundown #88: Grok 3 & Knock-off Jedis
Sunday Bonus #48: Get the gist of any topic with my custom GPT.
Happy Sunday, friends!
Welcome back to the weekly look at generative AI that covers the following:
Sunday Rundown (free): this week’s AI news + a fun AI fail.
Sunday Bonus (paid): an exclusive segment for my paid subscribers.
I’ve been away last week, so we have two weeks’ worth of AI news to catch up on!
Let’s get to it.
🗞️ AI news
Here are the AI developments of the past two weeks.
👩💻 AI releases
New stuff you can try right now:
Adobe launched its commercially safe Firefly Video Model with text-to-video, image-to-video, “translate video,” and camera control features. (Try it for free.)
Bolt introduced Native Mobile App support, so you can create and launch apps to the App Store without coding.
Deepgram released Nova-3, a speech-to-text model that can handle real-time multilingual transcription and has a 54% lower word error rate than competitors.
ElevenLabs made its long-form text-to-audio editor for storytellers called Studio available to everyone.
Fiverr has a new suite of AI tools called Fiverr Go that lets freelancers create, train, and manage personalized AI models to help with their gigs.
Google news:
Deep Research rolled out to the Gemini mobile app, so you can get comprehensive reports on any topic on the go.
Gemini can now recall and reference details from your past chats (a la ChatGPT “Memory”). Rolling out to Gemini Advanced users first.
Google Meet users can now scroll through 30 minutes of live captions and translations during the call.
NotebookLM Plus with higher usage limits is now available to Google One AI Premium plans.
PaliGemma 2 mix is a vision-language model that can handle a mix of tasks like object detection, image captioning, and OCR.
The impressive Veo 2 video model can now be used to generate clips for YouTube Shorts.
Whisk, the experimental image tool that uses reference images as prompts, is now out in 100+ countries. Except—typically—Europe. (Unless you use a VPN.)
Hugging Face released SmolVLM2, a family of small vision models that can understand video input and run locally on almost any device.
Luma AI added an image-to-video feature to its Ray2 video model.
Mistral AI launched Mistral Saba, an LLM for the Middle East and South Asia, with culturally relevant, nuanced responses in languages like Arabic and Tamil.
OpenAI unlocked support for file and image uploads for the o1 and o3-mini models. Also, Plus users now get 50 daily messages with o3-mini-high.
Perplexity AI news:
Deep Research (again, really?) can generate in-depth research reports for any query.
R1-1776 is an unbiased version of DeepSeek-R1 that’s been liberated from CCP censorship while remaining just as performant at reasoning tasks.
Sonar—the company’s in-house model built on Llama 3.3 70b—is blazing fast at 1200 tokens/second while performing on par with top-tier LLMs.
Pika news:
The company now has an official mobile app.
With Pikaswaps, you can replace anything or anyone in a video with a reference image or a prompted change.
SkyReels has an open-source video model and an all-in-one video creation platform.
Spotify will now accept audiobooks made in ElevenLabs, giving authors access to new markets with AI narration in 29 languages.
xAI launched the upgraded Grok 3 model, which now sits at #1 on Chatbot Arena in all LLM task categories. (Try it for free while the public beta lasts.)
Zyphra open-sourced Zonos-v0.1, a text-to-speech model capable of high-fidelity real-time voice cloning. (Try it for free.)
From my sponsor:
Generate SEO-Optimized Articles in seconds with RightBlogger. Our AI-driven writer helps you rank higher, faster—no more writing headaches.
🔬 AI research
Cool stuff you might get to try one day:
Alibaba previewed Animate Anyone 2 which can superimpose a reference character onto an existing video of another character.
Anthropic is reportedly gearing up to release a new “hybrid” model that integrates a traditional LLM with a reasoning one in the same interface.
ByteDance teased a joint image-and-video family of models called Goku that gets high points on GenEval and DPG-Bench benchmarks.
Google introduced AI co-scientist, a multi-agent system that helps scientists generate and verify novel hypotheses. (Apply for Trusted Tester access.)
Microsoft is working on a model called Muse, which can extrapolate gameplay sequences and controller actions based on 1-second reference clips.
OpenAI (via Sam Altman) shared a roadmap indicating GPT-4.5 will be its last traditional LLM, followed by GPT-5, a “system” that integrates OpenAI’s disparate features and models under one unified umbrella.
Snap Research introduced Dynamic Concepts that can personalize text-to-video output by capturing appearance and motion from single videos.
YouTube outlined its 2025 strategy, which includes a big focus on generative AI and new tools for creators.
📖 AI resources
Helpful AI tools and stuff that teaches you about AI:
“Agent Leaderboard” [TOOL] - a Hugging Face Space by Galileo on Hugging Face that helps you benchmark and evaluate AI agents on real-world use cases.
“Anthropic Economic Index” [REPORT] - an initiative to understand AI’s impact on the labor market and the economy, based on millions of anonymized Claude conversations.
“Azure AI Foundry Labs” [SITE] - a new hub for Microsoft’s ongoing AI research projects.
“Mastering AI Agents” [PDF] - a massive, 100-page guide to AI agents by Galileo.
“Modern-Day Oracles or Bullshit Machines?” [SITE] - an interactive exploration of LLMs by University of Washington professors Carl T. Bergstrom and Jevin D. West.
“National AI Opinion Monitor: AI Trust and Knowledge in America” [PDF] - a look at public attitudes towards AI in the US by Rutgers University.
“OpenAI Model Spec” [POST] - a public look at the desired model behavior by OpenAI.
“SWE-Lancer benchmark” [PAPER] - a new benchmark by OpenAI to evaluate models on real-world software engineering tasks from Upwork.
“Writing effective text prompts for video generation” [GUIDE] - Adobe wrote this guide for its new Firefly video model, but it’s useful for any AI video tool.
🔀 AI random
Other notable AI stories of the week:
Former OpenAI CTO Mira Murati announced her new venture, Thinking Machines Lab, to help advance AI through open science.
NVIDIA launched a vision-enabled Signs platform that teaches American Sign Language via a 3D avatar and real-time feedback.
🤦♂️ AI fail of the week
The Jedi’s close-up face at the end is exactly how I feel about this Sora video.
💰 Sunday Bonus #48: Get up to speed on any topic with my custom GPT
For today’s bonus, I cooked up a pre-prompted custom GPT that helps you quickly grasp any concept.
Starting with your query, this GPT will automatically:
Look up relevant, up-to-date info about the topic
Put together a top-level, beginner-friendly intro
Supplement it with visuals like tables or charts
Embed a relevant video about the subject for further exploration
For comparison, here’s a standard ChatGPT response for “tap dancing”:
Here’s the “Give Me The Gist” GPT version: