Sunday Rundown #80: Google Goes Big & LEGO Horrors

Sunday Bonus #40: Cool Midjourney style references, chapter 4.

Daniel Nest

Dec 15, 2024

∙ Paid

Happy Sunday, friends!

Welcome back to the weekly look at generative AI that covers the following:

Sunday Rundown (free): this week’s AI news + a fun AI fail.
Sunday Bonus (paid): a goodie for my paid subscribers.

All Sunday Bonuses In One Place

Let’s get to it.

🗞️ AI news

Here are this week’s AI developments.

👩‍💻 AI releases

New stuff you can try right now:

Google made a bunch of big announcements this week (see also “AI research”):
1. The brand-new Gemini 2.0 Flash model is twice as fast as Gemini 1.5 Pro while outperforming it on practically every benchmark.
2. Deep Research can create and follow a research plan, performing multiple searches to generate deep insights for a given query (available to Gemini Advanced subscribers.)
3. You can now share your webcam feed or screen with Gemini in realtime in Google AI Studio. (I wrote a quick guide and recorded a demo here.)
Leonardo has a new “Flow State” feature that creates endless image variations for a given prompt and lets you easily zero in on a style you like:
Microsoft’s newest addition to the Phi family—Phi-4—is small but highly capable, especially when it comes to math problems.
Midjourney’s new Patchwork is an endless canvas that lets you create and populate new worlds and collaborate with others. (Try the beta here.)
OpenAI is continuing its “12 Days of OpenAI” shenanigans:
1. Day 3: The long-awaited Sora video model is out. (But good luck trying to use it in these early days. The servers have been swamped.)
2. Day 4: The handy ChatGPT Canvas feature is now available to everyone and can also run Python code directly within the ChatGPT interface.
3. Day 5: ChatGPT now has more useful integrations with Apple Intelligence.
4. Day 6: The Advanced Voice mode in ChatGPT can now see your screen and camera feed in real time. (Rolling out only to paying customers for now.)
5. Day 7: Projects lets you organize your chats and work into folders with preset instructions, knowledge base, and so on. (Claude’s had essentially the same feature, also called Projects, out since June.)
YouTube’s auto-dubbing feature has moved out of research and is now officially live.
xAI made two releases:
1. The Grok chatbot is now available to everyone, including free users.
2. Grok’s image generation is now powered by xAI’s own model codenamed Aurora (instead of Black Forest Labs’ FLUX as before)

🔬 AI research

Cool stuff you might get to try one day:

Google has a few exciting projects in the pipeline:
1. Project Astra: A universal realtime voice assistant that can see what you see and speak multiple languages. Now with Gemini 2.0 under the hood and integrations with Google Search, Lens, and Maps:
2. Project Jules - AI coding assistant that integrates with your GitHub workflow and can handle tedious, time-consuming tasks.
3. Project Mariner - an agent that runs in your browser and can perform tasks on your behalf.
4. Gemini 2.0 will soon also be able to:
  1. Natively generate images and steerable multilingual audio.
  2. Serve as a copilot while you’re gaming.
Meta teased a bunch of projects from its Meta FAIR (Fundamental AI Research):
1. Meta Motivo is a behavioral foundation model that can learn from a changing environment in an unsupervised way. (Try the demo.)
2. Meta Video Seal is an open-source watermarking tool for AI-generated videos that makes them traceable. (Try the demo.)
Reddit is testing AI-powered Reddit Answers that synthesize community answers for a given query. (Sign up for the waitlist.)