Sunday Rundown #62: Realtime Voice & Harvey "Two-Bodies"

Sunday Bonus #22: The newest text-to-image models go head-to-head.

Daniel Nest

Aug 04, 2024

∙ Paid

Happy Sunday, friends!

Welcome back to the weekly look at generative AI that covers the following:

Sunday Rundown (free): this week’s AI news + an AI fail for your entertainment.
Sunday Bonus (paid): a goodie for my paid subscribers: a guide, tip, walkthrough of a tool, etc.

All Sunday Bonuses In One Place

Let’s get to it.

🗞️ AI news

Here are this week’s AI developments.

👩‍💻 AI releases

New stuff you can try right now:

Black Forest Labs—a new AI company by the team behind the original Stable Diffusion—released FLUX.1, a suite of text-to-image models in three versions:
1. FLUX.1 [pro]: A state-of-the-art model that the company claims beats all current competitors.
2. FLUX.1 [dev]: A distilled open-weight version that’s almost as good as the top one.
3. FLUX.1 [schnell]: A fast, less-performant model for personal use.
Google released three additions to its Gemma 2 family:
1. Gemma 2 2B: a tiny but capable model that can run on local devices.
2. ShieldGemma: a classifier model that can filter inputs and outputs of other AI models to “keep the user safe.”
3. Gemma Scope: a research tool that helps uncover the inner workings of Gemma 2 models.
Google has also snuck an experimental version of Gemini 1.5 Pro into its Gemini API and Google AI Studio (where you can try it for free). It soared to #1 in the LMSYS Chatbot Arena.
Meta has two new things you can try:
1. AI Studio where you create and share custom AIs (similar to Custom GPTs). Available to US users for now.
2. A working demo version of the successor to the original Segment Anything model. It’s called—wait for it—Segment Anything 2, and it lets you track any object in a video.
Midjourney released Version 6.1, which should generate more precise, coherent, better-quality images and text.
OpenAI had two notable “alpha” releases this week:
1. GPT-4o Long Output that can spit out 64K tokens per request.
2. The long-awaited Advanced Voice Mode is rolling out to select ChatGPT Plus users. (Check out these hands-on tests by Ethan Mollick.)
Runway now lets you upload an image as the starting frame for the latest Gen-3 version of its video model.
Stability AI dropped Stable Fast 3D, which can turn any image into a 3D asset in mere seconds. (Try the Hugging Face demo.)

🔬 AI research

Cool stuff you might get to try one day:

Google Chrome is about to get three new experimental AI features:
1. An easy drag-and-drop way to search using Google Lens.
2. A way to compare products across multiple tabs.
3. Natural-language search for your browsing history.

📖 AI resources

Helpful stuff that teaches you about AI:

LLM Hallucination Index compares how 22 large language models perform on 3 RAG tasks with many interactive tools, charts, and tables.

🤦‍♂️ AI fail of the week

“And for my next trick, I’ll displace the top half of my body.” (The final version.)

Man on stage accepting trophy for LinkedIn Top Voice. Top half of body hovers separately from the legs

Anything to share?

Sadly, Substack doesn’t allow free subscribers to comment on posts with paid sections, but I am always open to your feedback. You can message me here:

💰 Sunday Bonus #22: FLUX.1 vs. every other text-to-image model

It sure has been a busy week in the text-to-image space.

First, Midjourney 6.1 came out.

Then I gained access to Google Imagen 3 in the AI Test Kitchen.

Finally, FLUX.1 dropped, claiming to be the best model around.

So for today’s Sunday Bonus, I’ll do a condensed and updated version of my look at text-to-image models. (Ideogram 1.0 and Firefly Image 3 have also come out since then.)

Now let’s see how FLUX.1 compares to all the other major players!

Why Try AI