10X AI (Issue #9): Code Interpreter for All, AI Voice Tools, and Synchronized Horror
Plus Playground AI's big bet on the graphics revolution and using voice input/output to turn Bing into a helpful talking tutor for a child.
Happy Sunday, friends!
Welcome back to 10X AI: a weekly look at beginner-focused AI news, tools, and tips.
Let’s get to it.
🗞️AI news
It’s been a rather uneventful week in terms of beginner-friendly releases…with one major exception.
1. Code Interpreter is rolling out to ChatGPT Plus users
Edit 02-09-2023: The Code Interpreter is now “Advanced Data Analysis”
If you ever needed one final nudge to shell out for ChatGPT Plus, this is it!
OpenAI announced that their Code Interpreter will shortly become available to everyone with a ChatGPT Plus subscription.
The name is a bit misleading: It’s less to do with interpreting code and more to do with ChatGPT being able to write Python code to solve all sorts of complex problems.
This—along with Bing’s image recognition—is the most impactful development in recent months. ChatGPT can now work seamlessly with many types of uploaded data, reason at a higher level, create advanced charts and visuals, and more.
To get a taste of what Code Interpreter can do, read this article by Ethan Mollick.
Or watch this video:
2. Playground AI to take visual AI to the next level
The company just raised almost $41 million to work on making image-generating AI much more capable. Their ambition is to:
Improve spatial intelligence of text-to-image models (nuanced understanding of complex prompts to accurately render scenes with multiple subjects)
Further develop the ability to make subtle changes to images
Enable creation of entire 3D environments
…and more.
🛠️AI tools
Today I’m sharing some voice-related AI tools I’ve collected over the past several months.
(
did an excellent deep dive on voice-generating AI tools back in May, if you’re especially curious about this topic.)3. CloneDub
CloneDub lets you upload or paste a voice recording, which it analyzes, transcribes, and instantly dubs into another language in a voice that sounds like yours.
It currently supports 7 languages: Spanish, French, Hindi, Italian, German, Polish, and Portuguese. There’s a time-limited free trial so you can test it out.
4. Cohesive Voices
As a part of Cohesive’s content platform, Cohesive Voices integrates with their editor to create voiceovers in multiple languages and in a variety of voices.
They also offer a number of speech-oriented templates that help you easily generate written scripts for voiceovers to read out.
5. Cleanvoice
Cleanvoice is geared toward podcasters and aims to speed up the process of editing and cleaning up recorded audio.
Cleanvoice can automatically remove filler words, mouth sounds, noise, and dead air, among other things.
6. AI Voice Detector
AI Voice Detector is a sort of GPTZero for audio. It claims to be able to recognize whether voices in a recording are real or synthetic.
Note that AI-text detection tools aren’t particularly reliable, so I recommend also approaching this one with caution and run some tests.
7. Adobe Speech Enhance
Adobe Speech Enhance aims to remove background noise, echo, and other ambient sounds from uploaded audio, giving it an appearance of a studio-recorded clip.
I tried recording a quick test with fake ambient noises from a white noise app, and…it didn’t work nearly as well as I’d expected.
Original clip:
“Enhanced” clip:
It somehow managed to change my voice and cut off whole chunks of my speech. Then again, this was done using the built-in mic in my cheap headphones, with the white noise app sitting right next to it. As such, it’s likely not representative of real audio recordings with a moderate degree of ambient noise.
You’re better off testing this out for yourself.
8. MetaVoice
MetaVoice lets you change the way you sound on the fly, which is useful for voice chat, live streaming, and so on.
The idea is to convert your voice in real time while preserving the cadence and emotion of the original.
💡AI tips
Here’s this week’s tip.
9. Convert Bing into a speaking tutor for your kids
Back in April,
wrote about his experience with setting up a ChatGPT voice interface for his 3-year-old daughter. While Arvind turned to Google’s text-to-speech service for a more natural voice, using Bing’s own text-to-speech output is a passable alternative.First, you’ll want to switch the input mode from keyboard to voice by clicking the microphone icon on the right:
This turns the usual text input box into a microphone you can click when speaking:
When in “microphone” mode, Bing will not only write its answers but speak them out loud as well, so your child can follow along.
Now, you can condition Bing with an appropriate prompt. Here’s an example that worked well for me when trying to mimic Arvind’s setup:
“I want you to act as a kind and compassionate tutor for my 3-year-old daughter. She will ask you questions by speaking. Please answer in a way that a 3-year-old can understand, use simple terms, and keep your responses short - maximum 3 sentences. Keep the subject matter light and age appropriate. Stay attuned to the child's feelings and anxieties. You don't have to share any background reading or cite sources.”
I tried Arvind’s first prompt as a test and found that Bing was pretty good at adjusting the conversation to the child’s level, even though it ignored the 3-sentence limit:
Of course, do keep in mind that Bing is still a large language model and is not a replacement for real-world conversations. Be wary of made-up facts and supervise your kid’s interaction with the chatbot at all times.
🤦♂️10. AI fail of the week
Here’s a Midjourney result for “synchronized swimming.”
These sure are synchronized, as far as terrifying disembodied heads go.
Wonder if Clean Voice etc will work for all languages or is it really just English. As I could find it really handy for tidying up recordings in other languages. Thanks for testing these tools.
I'm most excited in the short-term about Bing's image recognition tool, and in the medium term about code interpreter. I want to play with an LLM with proprietary data, but I don't feel confident that the security considerations have begun to be addressed in a serious way. Any thoughts on that?