10X AI (Issue #20): DALL-E 3, Bard Everywhere, Microsoft Copilot, and AI Alexa
PLUS: New Lexica model, GitHub Copilot Chat, Genmo Replay, DeciDiffusion, Youtube AI, and ElevenLabs Projects.
Happy Sunday, friends!
Welcome back to 10X AI: a weekly look at beginner-focused AI news, tools, and tips.
I’ve had family visiting from overseas and a bunch of celebrations happening this weekend, so this is a “Lite” version with fewer personal deep dives.
Also, thanks to a huge amount of beginner-focused news, this is yet another only-news edition!
Let’s get to it.
This post might get cut off in some email clients. Click here to read it online.
🗞️ AI news
Here are this week’s AI developments.
1. OpenAI teases DALL-E 3
I usually tackle stuff that’s already out, but this was too big of a deal to skip.
OpenAI just announced its new text-to-image model: DALL-E 3.
Two things make DALL-E 3 stand out.
First, its natural language understanding is leaps and bounds above anything else that’s currently out there.
Check out this example (there are even more on the site):
Just for fun, I ran the same prompt through Midjourney:
Here’s the result:
We’re not even in the same league.
Midjourney only captures some of the elements in some of the images, and none of them are nearly as prompt-accurate.
(Hell, I’ve previously shown that Midjourney can’t even follow a much shorter prompt like “two red balls and one blue cube on a green table.”)
If DALL-E 3 lives up to the handpicked examples above, it’s going to open up a whole new frontier of competition for text-to-image models, which until now mostly revolved around image quality.
The second thing that makes DALL-E 3 special is that it’ll live directly inside ChatGPT. No Discord servers or clunky interfaces. You’ll simply talk to ChatGPT and work on your images together in a back-and-forth conversation.
Here’s how OpenAI envisions the process:
DALL-E 3 should roll out to ChatGPT Plus users in October.
I can’t wait to try it.
2. Google plugs Bard into its ecosystem
The other big announcement of the week is Google hooking Bard up to its entire suite of products, including YouTube, Drive, Gmail, Maps, and more.
We’ve seen tons of apps and ChatGPT plugins that do many of the above, like summarizing and drafting emails. But those were fragmented offerings from different third parties and didn’t always work or integrate smoothly.
Having Bard plugged into Google’s ecosystem takes us one step closer to having a true one-stop AI assistant.
Bard also got a few under-the-hood improvements:
It can now help you double-check its own responses by searching the web and highlighting sentences in green (confirmed, with a link to the source) or yellow (if there's reason to doubt it or no way to confirm it). Hopefully, this helps to reduce hallucinations.
You can now share Bard chats with other people, who can then continue where you left off (catching up to ChatGPT).
Improved PaLM 2 version that’s better at coding, brainstorming, and communicating in different languages.
You can give Bard access to your Google account by going to bard.google.com and clicking the “Extensions” icon in the top-right corner:
From here, you toggle the products you want Bard to plug into:
Enjoy!
3. Microsoft Copilot
But wait, what’s this?!
Is Microsoft doing pretty much the same thing with its Copilot?
Yes, it is!
The new Copilot will live in all of your Microsoft products and help you co-write, co-design, co-code, and many other co-things.
The Copilot will make its way into Windows 11 with a free update in two days and will later expand to Bing, Edge, and Microsoft 365.
4. ElevenLabs launches an audiobook product
ElevenLabs, one of the frontrunners in AI speech, has just released a product called Projects. It’s an all-in-one solution that can help you produce an entire audiobook, complete with separate speakers:
Unfortunately, unlike some other ElevenLabs products, Projects requires you to be on at least the “Creator” pricing tier.
You can check it out at ElevenLabs.
5. GitHub Copilot Chat now available to everyone
Whether you’re a seasoned or aspiring developer, this is for you.
Individual users with a GitHub Copilot subscription now have beta access to the Copilot Chat, which offers live suggestions for specific coding challenges. It can help analyze code, do basic troubleshooting, and even offer ideas for fixing security issues and vulnerabilities.
Start using GitHub Copilot Chat here.
6. Amazon grants AI superpowers to Alexa
Looks like the era of dumb voice assistants we’ve all come to mock is coming to an end.
Amazon has announced that Alexa is about to get new capabilities powered by generative AI. Here’s a cringey brand video of people asking Alexa to do something ChatGPT has been doing since late 2022:
Still, combining LLM powers with Alexa’s voice interface sets us on the path to a real-world version of Samantha from Her. Or J.A.R.V.I.S. from Iron Man, if that’s your thing.
7. Genmo’s Replay text-to-video is out
Looks like Runway is getting a bit more competition.
Genmo released the first iteration of its text-to-video tool called Replay:
It’s free to use and has a very simple interface with a single prompt box:
Here’s a 2-second video of a hiker crossing a river, generated from that prompt:
We’re not quite in Hollywood yet, but things are moving fast in this space.
8. Another fast text-to-image model: DeciDiffusion 1.0
Last week, I talked about Würsten (Sausage), which generated images very fast.
This week, we have DeciDiffusion 1.0, which claims to match StableDiffusion’s output quality while creating images three times faster.
In my test runs, every image took only 3 to 4 seconds:
But why listen to me? Here’s a free Hugging Face demo.
Have at it!
9. YouTube’s new AI features
YouTube is moving beyond AI-generated video summaries and unveiling a number of new AI features. Soon, YouTube creators will have access to:
AI-recommended music for their videos
Automatic dubbing for videos using Aloud
Personalized AI insights and suggestions for new videos to make
10. Lexica’s latest image model
Lexica has truly evolved from just being a reference library for Stable Diffusion into an image generation site with its own text-to-image models.
Now they’re out with Lexica Aperture 3.5.
Lexica claims the model is capable of generating high-quality photographic images and following prompts precisely.
Unfortunately, I just realized that Lexica no longer offers a free tier or free starting credits, so you’ll have to shell out at least $8 per month to try it.
Daniel, I thought I would report I finally got around to using ChatGPT. I'm honored to be the last person on the Internet to do so. :-)
As everyone else has already said, it's a quite impressive tool. And it's pretty much perfect for my current project. My blog is an image and video site, and my original intention was to not have any text content at all. But then I decided I wanted to share some background info about some of the artists I'm showcasing in the videos. But I don't want to write a bunch of bios, and on this site I'm not trying to impress anyone with my writing. I tried that already, and it didn't work. :-)
Point being, ChatGPT seems pretty close to perfect for this particular mission. There are a few puzzling quirks, like no button to copy the generated text (??) but that's a trivial quibble. If this hasn't happened already, it seems certain that Substack will be ever more populated with generated content.
From my long web development background, it seems easy to predict a coming phase for the text generators. Instead of generating articles one at a time, some future version of ChatGPT will generate entire websites. Imagine this prompt...
"Create a web site of more than 1,000 pages which offers a detailed history of the most influential figures in the hippy movement and rock music industry, between 1960 and 1980."
At some point people will be generating entire blog networks like Substack in an hour, instead of five years. Thousands of imaginary authors and their millions of auto generated articles.
Wait, don't get upset, I promise, this comment was hand typed by an actual human being. You remember those, right? Yes, it's true, no really, completely true. I mean, you know, probably. Could be? Ok, maybe not. I really don't know actually.
I’m also super excited about the Dall-E development. A “graphic novel” approach sounds enticing! But I’m a painter who shows in galleries, and I have to point out one thing. The Midjourney grid in your comparison contains images that I could imagine framed on my wall. The Dall-E image looks like a silly Disney poster. From a computer science perspective, Dall-E3 might be a giant leap for mankind. From an artist’s perspective, it doesn’t look like Midjourney has anything to worry about.