10X AI (Issue #9): Code Interpreter for All, AI Voice Tools, and Synchronized Horror

Plus Playground AI's big bet on the graphics revolution and using voice input/output to turn Bing into a helpful talking tutor for a child.

Daniel Nest

Jul 09, 2023

Happy Sunday, friends!

Welcome back to 10X AI: a weekly look at beginner-focused AI news, tools, and tips.

Let’s get to it.

🗞️AI news

It’s been a rather uneventful week in terms of beginner-friendly releases…with one major exception.

1. Code Interpreter is rolling out to ChatGPT Plus users

Edit 02-09-2023: The Code Interpreter is now “Advanced Data Analysis”

If you ever needed one final nudge to shell out for ChatGPT Plus, this is it!

OpenAI announced that their Code Interpreter will shortly become available to everyone with a ChatGPT Plus subscription.

The name is a bit misleading: It’s less to do with interpreting code and more to do with ChatGPT being able to write Python code to solve all sorts of complex problems.

This—along with Bing’s image recognition—is the most impactful development in recent months. ChatGPT can now work seamlessly with many types of uploaded data, reason at a higher level, create advanced charts and visuals, and more.

To get a taste of what Code Interpreter can do, read this article by Ethan Mollick.

Or watch this video:

2. Playground AI to take visual AI to the next level

The company just raised almost $41 million to work on making image-generating AI much more capable. Their ambition is to:

Improve spatial intelligence of text-to-image models (nuanced understanding of complex prompts to accurately render scenes with multiple subjects)
Further develop the ability to make subtle changes to images
Enable creation of entire 3D environments
…and more.

🛠️AI tools

Today I’m sharing some voice-related AI tools I’ve collected over the past several months.

(

Charlie Guo

did an excellent deep dive on voice-generating AI tools back in May, if you’re especially curious about this topic.)

3. CloneDub

CloneDub lets you upload or paste a voice recording, which it analyzes, transcribes, and instantly dubs into another language in a voice that sounds like yours.

It currently supports 7 languages: Spanish, French, Hindi, Italian, German, Polish, and Portuguese. There’s a time-limited free trial so you can test it out.

Check out CloneDub

4. Cohesive Voices

As a part of Cohesive’s content platform, Cohesive Voices integrates with their editor to create voiceovers in multiple languages and in a variety of voices.

They also offer a number of speech-oriented templates that help you easily generate written scripts for voiceovers to read out.

Check out Cohesive Voices

5. Cleanvoice

Cleanvoice is geared toward podcasters and aims to speed up the process of editing and cleaning up recorded audio.

Cleanvoice can automatically remove filler words, mouth sounds, noise, and dead air, among other things.

Check out Cleanvoice

6. AI Voice Detector

AI Voice Detector is a sort of GPTZero for audio. It claims to be able to recognize whether voices in a recording are real or synthetic.

Note that AI-text detection tools aren’t particularly reliable, so I recommend also approaching this one with caution and run some tests.

Check out AI Voice Detector

7. Adobe Speech Enhance

Adobe Speech Enhance aims to remove background noise, echo, and other ambient sounds from uploaded audio, giving it an appearance of a studio-recorded clip.

I tried recording a quick test with fake ambient noises from a white noise app, and…it didn’t work nearly as well as I’d expected.

Original clip:

1×

0:00

-0:06

“Enhanced” clip:

1×

0:00

-0:06

It somehow managed to change my voice and cut off whole chunks of my speech. Then again, this was done using the built-in mic in my cheap headphones, with the white noise app sitting right next to it. As such, it’s likely not representative of real audio recordings with a moderate degree of ambient noise.

You’re better off testing this out for yourself.

Check out Adobe Speech Enhance

8. MetaVoice

MetaVoice lets you change the way you sound on the fly, which is useful for voice chat, live streaming, and so on.

The idea is to convert your voice in real time while preserving the cadence and emotion of the original.

Check out MetaVoice

💡AI tips

Here’s this week’s tip.

9. Convert Bing into a speaking tutor for your kids

Back in April,

Arvind Narayanan

wrote about his experience with setting up a ChatGPT voice interface for his 3-year-old daughter. While Arvind turned to Google’s text-to-speech service for a more natural voice, using Bing’s own text-to-speech output is a passable alternative.

First, you’ll want to switch the input mode from keyboard to voice by clicking the microphone icon on the right:

This turns the usual text input box into a microphone you can click when speaking:

When in “microphone” mode, Bing will not only write its answers but speak them out loud as well, so your child can follow along.

Now, you can condition Bing with an appropriate prompt. Here’s an example that worked well for me when trying to mimic Arvind’s setup:

“I want you to act as a kind and compassionate tutor for my 3-year-old daughter. She will ask you questions by speaking. Please answer in a way that a 3-year-old can understand, use simple terms, and keep your responses short - maximum 3 sentences. Keep the subject matter light and age appropriate. Stay attuned to the child's feelings and anxieties. You don't have to share any background reading or cite sources.”

I tried Arvind’s first prompt as a test and found that Bing was pretty good at adjusting the conversation to the child’s level, even though it ignored the 3-sentence limit:

Sample conversation with Bing as a tutor for 3-year-old child

Of course, do keep in mind that Bing is still a large language model and is not a replacement for real-world conversations. Be wary of made-up facts and supervise your kid’s interaction with the chatbot at all times.

🤦‍♂️10. AI fail of the week

Here’s a Midjourney result for “synchronized swimming.”
These sure are synchronized, as far as terrifying disembodied heads go.

A bunch of disembodied heads floating in the water. Result for a Midjourney prompt "synchronized swimming." — More of this madness on Reddit

Sunday poll time

Liked the post? Help me grow Why Try AI by sharing it with others!

Previous issue of 10X AI:

10X AI

10X AI (Issue #8): Multimodal Bing, Image-To-3D, AI Games, and a Bad Bungee Blooper

Daniel Nest

July 2, 2023

10X AI (Issue #8): Multimodal Bing, Image-To-3D, AI Games, and a Bad Bungee Blooper

Happy Sunday, friends! Welcome back to 10X AI: a weekly look at beginner-focused AI news, tools, and tips. Let’s dive right in. 🗞️AI news Here are this week’s AI developments. 1. Bing is now officially multimodal! For well over a week, I’ve been jealously following

Read full story

Margie Ramsay

Jul 10, 2023

Wonder if Clean Voice etc will work for all languages or is it really just English. As I could find it really handy for tidying up recordings in other languages. Thanks for testing these tools.

Expand full comment

1 reply by Daniel Nest

Andrew Smith

Jul 9, 2023

I'm most excited in the short-term about Bing's image recognition tool, and in the medium term about code interpreter. I want to play with an LLM with proprietary data, but I don't feel confident that the security considerations have begun to be addressed in a serious way. Any thoughts on that?

2 replies by Daniel Nest and others

3 more comments...

Why Try AI