10X AI (Issue #24): Fuyu-8B, PlayHT Turbo, Two Fun Tools, and a Multitalented Bear
PLUS: Google's language learning, Pi with Internet access, Claude worldwide, new Midjourney upscalers, and getting better images out of DALL-E 3.
Happy Sunday!
Welcome back to 10X AI: a weekly look at beginner-focused AI news, tools, and tips.
Let’s get to it.
This long post might get cut off in some email clients. Click here to read it online.
🗞️ AI news
This week, instead of one or two blockbuster announcements from major companies, we’ve had a lot of smaller ones.
1. New Fuyu-8B multimodal model
For free ChatGPT users with no access to ChatGPT Vision, there are an increasing number of free alternatives lately.
This week, a company called Adept released Fuyu-8B, the smallest of their multimodal models. The model is fast and performs well across image question-answer benchmarks:
It appears to be quite accurate in my anecdotal testing:
Want to try it for yourself? Here’s a free Hugging Face demo.
2. PlayHT’s speedy text-to-speech model
PlayHT, one of the frontrunners in text-to-speech, just released a ridiculously fast model called PlayHT Turbo.
It can convert text into speech in practically real time. Most of the latency comes from the network connection rather than the model’s intrinsic speed.
In my own test, PlayHT Turbo generated output in under 0.5 seconds:
Here’s the above free demo with a bunch of voices and emotions. Enjoy!
3. Google takes on Duolingo. Sort of.
Lately, Google has been incorporating more features directly into its search environment. The latest one lets English learners start practice sessions from search results related to language queries:
The goal is to bring learning into the appropriate context. To begin with, this feature will become available to Android users in Argentina, Colombia, India, Indonesia, Mexico, and Venezuela. More countries are likely to soon follow suit.
4. Claude is available in more countries
Anthropic’s Claude still boasts the highest context window of any chatbot: 100K tokens, to be exact.
And now, users in 95 countries can access the model. (Denmark’s not on that list, which means you can look forward to me complaining about that for a while.)
There’s a free version anyone—except me—can try, so go check out Claude.
5. Pi can now browse
Inflection’s Pi chatbot is known for its friendly and supportive personality.
As of this week, it’s also plugged into the Internet and can access updated info on any topic.
If you haven’t tried Pi yet, you now have another reason to check it out.
From my sponsor:
Recast helps you ‘read more’ without reading. Easily breeze through your reading backlog by converting articles into bite-sized audio convos. No more 'info FOMO' – stay up-to-date and discover thousands of interesting recasts within the app.
6. Midjourney releases a new built-in upscaler
We haven’t had a new Midjourney model release since Version 5.2.
But the company has been keeping us happy with additions like Vary (Region).
This week, Midjourney finally launched an upscaler for Version 5 (and above).
Under any individual image, you should now see options to upscale it by 2x or 4x:
Unlike Version 4 upscalers, which altered the image itself in the process, the new ones are supposed to stick as closely as possible to the original.
🛠️ AI tools
Today, I have space for just two tools, so I’m sharing a couple of fun ones I came across recently.
7. Upside Down Diffusion
Hey, remember the classic “princess or old lady” optical illusion? Where you can see one or the other by simply rotating the picture?
Upside Down Diffusion lets you make your own rotating illusions with any two subjects of your choice.
I tried having a skeleton turn into a mummy…
…a squirrel on a branch that becomes King Kong…
…and, of course, the classic:
What will you try?
(Thanks to
for sharing the tool in one of her posts.)8. Riffusion
Not to be confused with the namesake text-to-music model I tested back in June, Riffusion can create a music track in any genre that incorporates whatever lyrics you give it.
(So it’s very similar to Suno’s Chirp I recently covered but with a proper interface for those who might be avoiding Discord.)
The process is simple. You feed Riffusion the lyrics and describe the sound or genre you want:
Then you get three options to pick from. Here’s my favorite:
You can save your riff as either a video or an audio file. But what’s really cool is the ability to split the track into stems with a single click:
This gives you each individual instrument or voice as a separate audio file:
Curious? It’s free to try so go for it:
💡 AI tip
Here’s this week’s tip.
9. Nudge ChatGPT into giving better DALL-E 3 prompts
ChatGPT Plus is already pre-prompted to create its own detailed DALL-E 3 descriptions when you request an image. But YouTuber Glibatree came up with a set of detailed custom instructions that take ChatGPT’s DALL-E 3 descriptions even further:
And Glibatree was generous enough to share the exact instructions. Simply paste them into the appropriate “Custom Instructions” sections to have ChatGPT write better prompts for DALL-E 3.
Can’t use Custom Instructions because you’re e.g. using them for something else? No problem, simply copy-paste Glibatree’s text into any new DALL-E 3 chat before asking it to generate images.
Have fun!
🤦♂️ 10. AI fail of the week
I asked for an “ice skating bear,” but this is so much better.
My heart goes out to you not having access to Claude!🥲. The huge context window makes it my favorite LLM. My favorite experiment so far was to paste in ‘The circular ruins’ by Jorge Louis Borges and ask for some critical analysis. Claude actually helped me to understand the themes and symbols in this obscure story better! I hope Claude comes to your neck of the woods soon. Maybe try a VPN?
Thanks Daniel! PlayHT and Riffusion both look very interesting. Looking forward to exploring both. Keep'em coming!