Sunday Rundown #85: Reasoners, Operators, and SpongeBob LampHead
Sunday Bonus #45: Custom GPT for writing o1 briefs.
Happy Sunday, friends!
Welcome back to the weekly look at generative AI that covers the following:
Sunday Rundown (free): this week’s AI news + a fun AI fail.
Sunday Bonus (paid): a goodie for my paid subscribers.
Let’s get to it.
🗞️ AI news
Here are this week’s AI developments.
👩💻 AI releases
New stuff you can try right now:
Anthropic rolled out an API feature called “Citations,” which grounds Claude’s responses in source documents.
DeepSeek open-sourced a free reasoning model called DeepSeek-R1, which performs on par with OpenAI’s o1, along with six smaller models distilled from R1. (Try R1 for free here. Read my thoughts on the model here.)
Freepik integrated Google’s Imagen 3 model into its AI Suite.
Google’s been busy:
Released a new version of its reasoning model, Gemini 2.0 Flash Thinking Experimental 01-21, which currently sits at #1 on Chatbot Arena.
The latest iteration of its Imagen 3 image mode is also #1 on Text-to-Image Arena.
Increased the price of Google Workspace plans by $2/month but these now include Gemini AI which used to cost $20/month as a standalone product.
Announced lots of ongoing and upcoming updates to Gemini on Android devices.
HeyGen introduced a “motion control” feature that lets you direct the movements of its avatars and the camera:
Kling AI released an awesome feature called “Elements” that lets you upload up to 4 reference images and use those in the same video as consistent characters, items, or scenes.
KREA AI now lets you train custom models for its real-time editor to keep your characters, scenes, or items consistent.
Midjourney has two updates:
The Describe feature has been overhauled and you can now invoke it on the website by simply right-clicking on any image and picking “Describe.”
You can now freely mix your Moodboards and Style References to create new styles. (I covered both concepts in my recent video guide.)
OpenAI has made some noise, too:
Released a research preview of its much-hyped “Operator” agent that can autonomously browse the web and complete tasks on your behalf. (Available only to Pro accounts in the US for now.)
Significantly improved Canvas, which is now available to the o1 model and can natively render React / HTML code that o1 produces.
Overhauled the “Custom Instructions” UI in ChatGPT to make them more intuitive to fill out.
Perplexity…
Introduced an agent called Assistant, which can search the web and perform limited actions like booking tables, calling rides, etc. (Paid plans only.)
Launched an API platform called “Sonar” that lets developers incorporate Perplexity models into their products.
Spline released a model called Spell that can generate a 3D world from a single input image.
Tencent has a new 3D model called Hunyuan3D 2.0 that can generate high-quality 3D assets from image input. (Try it for free.)
🔬 AI research
Cool stuff you might get to try one day:
ByteDance also has a reasoning model called Doubao-1.5-pro that uses a mixture-of-experts architecture to outperform GPT-4o and Claude 3.5 Sonnet despite its smaller size. But we’re not quite done with reasoning models yet...
Kimi presented yet another reasoning model called Kimi k1.5 that it claims also achieves “o1-level multi-modal” performance.
OpenAI plans to make its upcoming o3-mini reasoning model available to free ChatGPT users.
Pika dropped a teaser for its upcoming, improved video model, Pika 2.1.
📖 AI resources
Helpful AI tools and stuff that teaches you about AI:
“AI Avalanche” [VIDEO] - great coverage of nine key developments by AI Explained.
“Behind the Curtain — Coming soon: Ph.D.-level super-agents” - a curious take on what might be cooking behind the scenes by Axios.
“Inside Anthropic's Race to Build a Smarter Claude and Human-Level AI” [VIDEO] - a relatively short but illuminating interview with Anthropic’s Dario Amodei by the World Street Journal’s Joanna Stern.
“Reasoning with o1” [COURSE] - an excellent, free 1-hour video course by OpenAI’s Colin Jarvis.
“Work Change Report: AI Is Coming to Work” [PDF] - a LinkedIn report about the impact of AI on work.
🔀 AI random
Other notable AI stories of the week:
OpenAI announced a venture called Stargate that aims to invest a whopping $500 billion over four years to build new AI infrastructure in the US, in partnership with SoftBank, Oracle, MGX, Arm, Microsoft, and NVIDIA.
🤦♂️ AI fail of the week
What I asked Kling AI for: “Creepy footage of Homer Simpson emerging from the forest at night, captured on a night-vision trail cam.”
What I got:
💰 Sunday Bonus #45: My custom GPT that helps you draft better briefs for o1
Two days ago, I argued that you should use GPT-4o to flesh out and make structured briefs for OpenAI’s o1 reasoning model.
Now, I’ve taken the next step and made a custom GPT for just this purpose.
This “Brief Writer” does the following:
Helps you figure out if your request should be sent to o1 in the first place.
Gathers information from uploaded files or external links (something o1 can’t do).
Repeatedly interview you to get a complete understanding of your needs.
Drafts and structures the o1 brief using known best practices.
Outputs the brief to “Canvas” for further co-editing or copy-pasting into o1.
Here’s a chat snippet of the “Brief Writer” in action:
The “Brief Writer” isn’t a silver bullet and has the same limitations as any Custom GPT.
But after lots of testing and tweaking, I find that it now makes an excellent partner for initial brainstorming and structuring your messy thoughts.