10X AI (Issue #19): Stable Audio, Microsoft phi-1.5, AI Coaching, and Epic Poker
PLUS: Runway's Director Mode, Amazon's AI out of testing, Adobe Firefly out of beta, Würstchen text-to-image, and prompting ChatGPT with its own output.
Happy Sunday, friends!
Welcome back to 10X AI: a weekly look at beginner-focused AI news, tools, and tips.
Let’s get to it.
This post might get cut off in some email clients. Click here to read it online.
🗞️ AI news
Here are this week’s AI developments.
1. Stable Audio music model is here
Welp, it looks like text-to-music is the next area of rapid growth in generative AI.
Just last week, I wrote about Chirp v1 (which composed an epic gangsta rap about Why Try AI).
This week, Stability AI unleashed Stable Audio onto the world.
Stable Audio is a latent diffusion model that takes text prompts as input to generate music clips with a 44.1 kHz sample rate.
And it’s available to everyone right now!
You simply describe the track you want…
…and then get your track:
The free plan lets you generate 20 tracks per month of up to 45 seconds each.
The pro plan bumps this to 500 tracks of up to 90 seconds and grants you a commercial license for the resulting audio.
So head on over to Stable Audio to check it out!
2. Microsoft’s phi-1.5 beats (the smallest) Llama 2
At this point, it feels like not a week goes by without a new LLM beating some kind of record.
This week, it’s Microsoft’s phi-1.5 (resemblance to any college fraternity is purely coincidental).
The big deal with phi-1.5 is that it’s trained entirely on synthetic data rather than anything scraped from the Internet. This also made its training run incredibly cheap and fast.
Finally, phi-1.5 beats the 7B version of Meta’s Llama 2 on several tests like Winogrande and OpenbookQA (I covered both in my recent breakdown of LLM benchmarks).
I couldn’t find an easily accessible public demo, but the code is available on Hugging Face, if you’re the type to use it (you know who you are).
3. Runway introduces Director Mode
Runway, the current leader in text-to-video, just launched Director Mode.
This feature lets you actively decide where to point the camera and the intensity of its movement.
So you’re no longer limited to a static camera angle.
Don’t you just feel like a real director now?
Head to RunwayML.com and pick Gen-2 to test it out.
4. Two callback stories
We have two developments this week related to the past issues of 10X AI.
Amazon’s AI rolls out to the public
I wrote about Amazon testing AI-written product texts a month ago.
It appears the trials went pretty great, because Amazon is now officially announcing the feature. Amazon’s AI can take a short description of your product and create the first draft of the entire product listing for your review.
If you’re an Amazon seller, check if you already have “Generate Listing Content” available to you in early access.
Adobe Firefly for all
Read my lips: “No. More. Beta.”
Adobe Express moved straight outta beta a month ago, and now Adobe Firefly is following in its footsteps with a broad release.
And just as with Adobe Express, the launch is accompanied by a slick video with blood-pumping beats (not that I’m complaining):
This means you’ll now find Adobe Firefly across the entire suite of Adobe products.
You can also just head to firefly.adobe.com and enjoy the free standalone version.
5. New “Sausage” text-to-image model
We also just got a new text-to-image diffusion model called Würstchen (“sausage” in German).
What makes Würstchen special is that it operates within an extremely condensed “latent image space.” (Much like sausages are containers of extremely condensed meat-adjacent ingredients, I guess?)
For practical purposes, it means Würstchen is way cheaper and easier to train and it generates images faster than existing models while using up less memory. so it can run on low-end computers.
Here’s a Würstchen painting of itself:
The above was made with this free Hugging Face demo. Check it out!
🛠️ AI tools
Today we have three AI coaching apps that might make you a better person.
6. Pocket Hansei
Pocket Hansei lets you consult a vast library of expert insights in a conversational manner. You pick a topic you’re interested in:
Then you ask questions about it, with the bot giving you brief answers and references to credible sources that you can explore for further info.
The free plan has a limit of 15 questions, which is plenty for demo purposes.
7. Summit
Summit is an AI coach that helps you formulate a goal, chunk it down into smaller sub-goals, and then holds you accountable by following up on your progress.
When you first try to describe a goal, Summit tells you how to make it more specific, realistic, and so on:
In the end, you’ll have a list of smaller goals to pursue:
From then on, you’ll be able to chat with your coach, discuss your progress, ask questions, get tips for improvement, and more.
As far as I can tell, Summit is completely free to try at this stage.
8. Poised
Poised is quite similar to the Yoodli app I showcased in May.
It helps you prepare for meetings by making suggestions based on your notes and anticipating questions you’re likely to get.
During a call, Poised listens to your side of the conversation and gives you live feedback on your pace, use of filler words, unclear statements, and so on.
Over time, you’ll get a thorough overview of your past meetings and your progress. Poised is free to use for your first five meetings.
💡 AI tip
Here’s this week’s tip.
9. How to use ChatGPT’s own output to prompt it
We all know that you can steer ChatGPT in the right direction by feeding it examples of the type of output you expect.
But what if you don’t have a good example handy?
Just get ChatGPT to generate one!
Say you want ChatGPT to give you analogies for different concepts in a specific tone of voice…but you’re not sure which tone of voice you prefer.
You can first request a few alternative analogies for a single concept:
Let’s say “Technical” is what you’re looking for.
You can start a new chat and use the “Technical” response to nudge ChatGPT:
This way, you’re sure to get a more repeatable template.
You can use this approach with any task and any other large language model, of course.
🤦♂️ 10. AI fail of the week
Midjourney apparently thinks “poker” is way more exciting than it actually is
No I’m just stuck in a glitch.
I’m going with #4.