Sunday Rundown #67: OpenAI o1 & Emu-Mutant
Sunday Bonus #27: Using LLMs to create prompt lists for text-to-image models
Happy Sunday, friends!
Welcome back to the weekly look at generative AI that covers the following:
Sunday Rundown (free): this week’s AI news + a fun AI fail.
Sunday Bonus (paid): a goodie for my paid subscribers.
I’ve been gone last week, so we have two weeks' worth of catching up to do!
Let’s get to it.
🗞️ AI news
Here are this week’s AI developments.
👩💻 AI releases
New stuff you can try right now:
Anthropic launched a new Claude Enterprise plan that bumps the context window to 500K tokens, increases usage limits, and integrates natively with GitHub.
Google added “function calling” to Google AI Studio, making it easier for developers to test models directly inside the UI.
Chinese startup Hailuo AI launched a video model called MiniMax video-01 that generates 6-second clips from text prompts. (Try it for free here.)
Hume AI released the next version of its Empathic Voice Interface called EVI 2. It’s a voice-to-voice model that can pick up emotional cues and respond in virtually real-time. (Try chatting to some of Hume’s demo characters.)
Luma released version 1.6 of its Dream Machine video tool, which now lets you control camera motion with text prompts.
Meshy is now out with Meshy-4, the latest and most impressive version of its text-to-3D engine. (It’s been less than a year since Meshy first launched.)
Mistral AI released its first-ever multimodal model called Pixtral 12B. (Coming soon to the company’s “Le Chat” interface.)
NotebookLM now has an “Audio Overview” feature, which turns the sources you upload into an audio conversation between two speakers. (I wrote a deep dive on NotebookLM and a guide to using it to parse multiple sources.)
After months of speculation, OpenAI released its o1 series of reasoning models that “think” through responses via built-in chain-of-thought and can therefore solve especially complex problems and coding challenges.
Runway paid users can now access Gen-3 Alpha Video to Video, which lets them upload a video and change it into any aesthetic.
Suno introduced a feature called Covers that can reimagine an uploaded track (or just you singing) as any genre while keeping the lyrics and melody intact.
🔬 AI research
Cool stuff you might get to try one day:
Adobe teased its upcoming Firefly Video Model, boasting many cool features and integrations with the Adobe suite. (Sign up for the waitlist.)
Amazon is planning to revamp its Alexa assistant using Anthropic’s Claude model under the hood. This “Remarkable” Alexa is due to be released in October.
Apple said its Apple Intelligence will start rolling out in October.
Google is rolling out “Ask Photos” to a limited set of US users. This new feature understands context and lets you find photos using natural language. (Join the waitlist here.)
Google is also starting to make Gemini Live available to free users on Android.
Replit is making its new Replit Agent available in early access to subscribers on Core and Teams plans. It’s an all-in-one AI tool that helps users create applications from scratch.
📖 AI resources
Helpful stuff that teaches you about AI:
”AI prompt engineering: A deep dive” [VIDEO] - Anthropic’s prompt engineering experts share many useful tips.
Anthropic Quickstarts - a list of ready-made projects for developers to build “deployable applications” upon.
”ChatGPT o1 - First Reaction and In-Depth-Analysis [VIDEO]” - an excellent (as always) deep dive on o1 by AI Explained.
Conversational AI Powered by Large Language Models Amplifies False Memories in Witness Interviews [PDF] - a fascinating study that shows how AI can mess with people’s memories.
Cursorcasts - a collection of free screencasts by Cursor to teach people how to code with AI.
“How to Supercharge Your Writing With AI Tools” [VIDEO] - yet another episode of Dan Shipper’s excellent series, featuring writer Evan Armstrong.
Time 100 / AI (2024) - this year’s list of the most influential people in AI from TIME magazine.
Writing With AI - practical insights from different writers who use ChatGPT. (Many of the use cases overlap nicely with my “Skeptical Writer's Guide to AI.”)
🔀 AI random
Other notable AI stories of the week:
Ilya Sutskever’s startup Safe Superintelligence raised $1 billion to work toward developing superintelligence responsibly and safely.
🤦♂️ AI fail of the week
Bad GPS is the least of your problems, whatever you are. (Final version here.)
Anything to share?
Sadly, Substack doesn’t allow free subscribers to comment on posts with paid sections, but I am always open to your feedback. You can message me here:
💰 Sunday Bonus #27: How to create copy-paste image prompts at scale with LLMs
Earlier this year, I showed how you can use LLMs to come up with better prompts for your AI images.
But AI chatbots can also dramatically speed up the creation of batch prompts when you’re working on an image series with a specific theme (like one of these.)
Today, I’ll show you a platform-agnostic process that uses one-shot prompting to generate any number of copy-paste image prompts for any text-to-image model.
I like that it combines the “inspire me” and “give me practical outcomes” into a single prompt, so you can both evaluate and immediately use the suggestions.
Let’s roll!