10X AI (Issue #28): @sama Drama, Google's Music, and a Ping-Pong Bat Juggler
PLUS: Microsoft Ignite, Meta's Emu tools, more Google, Deforum on Discord, Niji Style Tuner, Meshy text-to-3D, and customizing GPTs for voice chats.
Happy Sunday, friends!
Welcome back to 10X AI: a weekly look at beginner-focused AI news, tools, and tips.
Let’s get to it.
This post might get cut off in some email clients. Click here to read it online.
🗞️ AI news
Here are this week’s AI developments.
1. Sam Altman is fired…then re-hired?
I usually don’t cover industry news, since my focus is on user-level product releases.
But this is such a massive ongoing story that I had to mention it.
The short of it: On Friday, OpenAI’s board of directors suddenly fired Sam Altman (co-founder and CEO) with no notice.
The main reason? Sam “was not consistently candid in his communications with the board.”
The board also asked Greg Brockman (co-founder and president) to step down as chairman but remain at OpenAI.
Greg promptly quit.
X / Twitter absolutely lost its collective shit (more so than usual).
Everyone’s feeds exploded with speculations, theories, conspiracy theories, hot takes, and memes. So many memes.
The abruptness of it all, coupled with the vague reasons for the firing, fueled by rumors of an internal rift about the direction of OpenAI, multiplied by the strange timing after what was widely considered a successful OpenAI DevDay1 just a week earlier, turned this into a high-stakes drama nobody could turn away from.
Today, just two days later, the board of OpenAI is asking Sam Altman to return.
Sam Altman has broad support among OpenAI staffers, many of whom threatened to quit if he didn’t come back.
Remember: OpenAI is behind DALL-E 3, GPT-4, Whisper, and of course, ChatGPT, widely regarded as the product that launched the current generative AI boom.
Sam Altman is well-liked and considered a highly effective leader.
What happens at OpenAI may well have repercussions for the industry’s trajectory and the widening split between AI “accelerationists” and AI “doomers.”
But for now, we wait.
Update Nov 20, 2023: Sam Altman and Greg Brockman have now joined Microsoft and will not be returning to OpenAI.
2. Google releases a new music model called Lyria
What an evolution at Google!
From “we have a music model but you can’t see it” to “some of you can now sort of test it” to “we’re officially letting artists create with it.”
Google’s Lyria model appears to be an upgraded version of MusicLM.
It can create high-quality music, transform hums into instruments, continue tracks based on a few starting notes, and more:
Lyria will be used in two so-called “experiments":
Dream Track that lets YouTube creators generate unique 30-second soundtracks for their Shorts.
Music AI Incubator that lets artists, songwriters, and producers experiment with AI tools.
Google is trying to go about this in a responsible way, helping the industry to embrace AI while protecting the rights of creators.
3. Microsoft Ignite
Microsoft held its annual Ignite conference.
While it’s primarily aimed at professionals and developers, some news items affect everyone:
Say goodbye to Bing Chat, say hello to “Copilot”: Microsoft’s new umbrella term for its AI assistant and suite of customer AI tools.
Launch of Loop, Microsoft’s answer to Notion.
Immersive spaces for Microsoft Teams where people can create avatars, pick 3D environments, have multiple simultaneous conversations, etc.
For more, here’s the Satya Nadella’s full keynote:
4. Meta teases AI video and text-based image editing
Just one year ago, Meta was the first to tease text-to-video with this unsettling demo:
Since then, AI video has made major strides, as I showcased in my recent look at 6 text-to-video tools.
Well, Meta’s back, baby!
And it’s never looked better:
That’s Emu Video, one of Meta’s two new research previews.
The second one is Emu Edit, which is an advanced version of text-based-inpainting that lets you modify images through text descriptions:
Meta even shows how these capabilities can be combined to modify images and turn them into videos:
These are both in the research phase, but you can try a limited “demo” of the Emu Video model.
5. A few minor Google releases
Google also announced a few consumer-level features.
“Stacks,” which gather similar photos of the same event into a single group
AI-driven screenshot categorization
AI-created subcategories of products when searching for niche gift ideas
AI-generated visualizations of nonexistent products a user might want, which then help the user find real similar-looking products
The shopping options should be available to all US users on the Search Generative Experience.
6. Deforum comes to Discord
Deforum has been around since late last year.
It creates videos by interpolating a series of frames from one starting point.
Until now, Deforum was mostly for tech-savvy people with local installations of AUTOMATIC1111 or those using dedicated Google Colab spaces.
But now it’s coming to Discord (granted, not a major improvement for some):
You can sign up for the Discord Beta waitlist. Then you too can make trippy sequences like this:
7. Midjourney brings the Style Tuner to Niji
Midjourney’s Style Tuner went live two weeks ago.
Now, it’s also available for Niji, which is an anime Midjourney model.
Everything should work exactly as with the Style Tuner for regular Midjourney. Here’s my recent deep dive:
🛠️ AI tools
To stay within my 10-item limit, I have just one tool today.
8. Meshy
Meshy is the new kid on the 3D block.
While Meshy runs a Discord server like most other text-to-3D tools, it also has an intuitive web interface that Discord-averse users will welcome.
Simply describe your desired object, pick your style, tweak a few optional settings, and click “Generate”:
Several minutes later, your 3D asset is ready:
From here, you can download the asset, share it, or make adjustments.
Meshy has a generous free plan that gives you 200 generation credits per month.
💡 AI tip
Here’s this week’s tip.
9. Change how your GPT acts during a voice chat
If you’re building a GPT, you might want it to behave differently when using voice chat. For instance, you’d likely want it to get to the point faster.
Fortunately, it’s easy to do this by simply telling your GPT what you expect. Here’s a Say A Little / Say A Lot GPT I made to showcase this.
All that’s behind it is this short prompt:
When using text-based chat, you give wordy, descriptive answers. When using voice chat, you keep every answer to a single short sentence, unless the user asks you to elaborate.
Here’s how that looks in practice (note that microphone icon for the second question, which is when I used my voice):
Same question, different behavior.
This is just an example. You can use the same principle to customize how your GPT acts in different circumstances.
🤦♂️ 10. AI fail of the week
I wanted a “knife juggler.” I don’t know where the table tennis bats came from.
Sunday poll time
Previous issue of 10X AI:
Remember, GPTs announced at DevDay drove so many new paid users to OpenAI that their servers could not cope.
Altman is so hot right now! Or cold, I'm not sure.
It really was an abrupt shock, and it sort of sounds like it's being framed as capitalism/greed vs slowing down/wisdom (although that could just as easily be a convenient ruse to cover up the fact that Sam Altman definitely peed in Ilya's ice cube trays. Ale Piad Morffis can vouch; he heard about it here on Notes.