10X AI (Issue #4): Google’s Big AI Play, AI Productivity Tools, and a Hapless Bird Watcher
Plus new open-source LLMs, Meta's ImageBind, animation tool from Stability AI, and a simple trick to make Bing more cooperative.
Happy Sunday, friends!
Welcome back to 10X AI: a weekly look at beginner-focused AI news, tools, and tips.
Let’s dive right in.
🗞️AI news
Here are this week’s AI developments.
1. Google goes AI-ll-in
Returning from the dead, Google went all-AI in its I/O 2023 Developer conference.
Just a few quick highlights *deep breath*:
New, more powerful PaLM 2 LLM coming to Bard
Bard is now available in 180+ countries without a waitlist and should soon get multimodal features like image recognition
Duet AI brings AI-powered productivity to Gmail and Google Workspace (Sheets, Docs, and Slides)
AI-driven immersive routes in Google Maps
AI-powered Magic Editor in Google Photos
Their MusicLM text-to-video model, which I covered earlier in February, is now public! (There is a waitlist though.)
2. MosaicML’s new open-source LLM
Seriously, the parade of new LLMs just won’t stop lately.
MosaicML just unveiled an open-source model licensed for commercial use.
It’s even got a catchy name: MPT-7B. Just rolls off the tongue, doesn’t it?
What sets MPT-7B apart is its ability to handle extremely long input of up to 84K tokens. (Anthropic recently bumped Claude to 100K but you need to request access.)
There are three sub-models with different purposes, which you can check out for yourself right away:
MPT-7B-StoryWriter-65k+: Read/write ultra-long stories (Demo)
MPT-7B-Instruct: Follow short-form instructions (Demo)
MPT-7B-Chat: Classic chatbot interface for dialogue (Demo)
3. Stability AI enters the animation game
People have been animating with Stable Diffusion for a while.
But now Stability AI has released an official Stable Animation SDK tool for designers and artists, giving them three ways to create animations:
Pure text input
Image + text input
Video input (similar to Runway Gen-1)
Unfortunately, I couldn’t find any free public demo at this point, so you’ll have to learn a bit of Python to try it for yourself.
4. Meta AI takes multimodality to the next level
When OpenAI demoed multimodality in GPT-4, it was all about understanding images.
Meta AI said “Hold my beer,” and went for a whole six modalities with its new ImageBind model:
Text
Audio
Image/video
Thermal
Motion and position (or Inertial Measurement Units, if you’re feeling fancy)
Depth
ImageBind can collate input from all of the above to understand the world and generate various types of output:
Here’s a limited public demo with a few preset examples, so you can see what all the fuss is about.
🛠️AI tools
This week’s selection of beginner-friendly AI tools. In last week’s poll, most of you asked for productivity tools. So that’s what I’ll focus on today:
5. Opus Clip
Want to give a YouTube video some social media love or just highlight its most valuable parts?
Opus Clip puts this task on autopilot. It takes a long video, automatically chunks it down to 10 shorter clips, captions them, and creates headline suggestions.
It even gives each resulting clip a predicted “AI Virality Score” based on past performance of similar clips. There’s more AI-magic under the hood that can track a speaker’s face, highlight impactful keywords in captions, and generate relevant emojis.
All it takes is literally a single click. You simply drop the YouTube URL into the relevant field and press the “Get Clips For Free” button.
Here’s a not-the-most-inspired-but-visual demo:
Opus Clip is currently free and will still have a “Free Forever” plan when they launch paid options soon.
6. Guidde
Need to make a visual “How to” walkthrough of website features, processes, and so on?
Guidde is a stupid simple way to do that.
It’s a browser extension that gives you a big red “Capture” button:
Then you fill in a few quick details about what you’re capturing:
You’re all set!
After clicking “GO,” every new click you make in the active tab will be automatically registered. Once you’re done, Guidde will create illustrated visuals of each step with accompanying text.
You’ll then get a slide deck and a video that you can export in multiple ways.
Here’s a silent GIF of the complex process of commenting on this site:
And here’s a narrated version using stilted AI voiceover.
Don’t worry, you can record your own voiceover or maybe use one of the more natural-sounding AI voices. (If you need inspiration, this roundup article by
is a great place to start.)You can also edit any generated text or visual elements, but the whole point is that you mostly won’t have to!
7. Style AI
There’s been a whole slew of AI website builders released lately.
Style AI reduces the entire process to a few quick exchanges with an AI chatbot:
You answer several straightforward questions about whether the site is for business or personal use, the topics you’d like covered, and the types of imagery and theme you want. Then, voila:
You get a ready-made site with text, sections, and images put together entirely by AI. You can customize it yourself or continue asking the bot to make changes for you:
If you’ve never set up websites before, Style AI is a foolproof way to dive in.
8. Spoke
If you’re okay with an AI tool recording your meetings, you can get a lot out of Spoke.
Spoke joins your meetings as a participant.
It then records the entire meeting (or only parts of it, if you so choose) and generates a transcript. Here’s a sad and lonely call I had with Spoke all by myself:
Spoke then picks the most appropriate summary template depending on the context and populates it with relevant parts of the meeting:
Clicking on highlighted action items and meeting summary notes takes you to the relevant parts of the video. It’s really handy to keep track of discussions, actions, and more.
Spoke also has features for meeting prep and in-meeting agenda tracking. It certainly helps that it’s free to try and works for all the major video meeting platforms.
💡AI tips
Here is this week’s tip.
9. Get more out of Bing Chat…by being ultra nice
I’ve already hinted at this briefly in my post about coding with AI.
This trick seems to work for other tasks and questions that Bing initially refuses to perform or answer. Bing is fickle, and it likes to be treated with respect.
Watch Bing refuse to answer whether it’s powered by GPT-4:
Now see what happens when I’m super polite:
See if this helps you get Bing Chat to cooperate!
🤦♂️10. AI fail of the week
Bob dropped out of birdwatching school, but that never stopped him from pursuing his passion.
Thanks for the great info. Magiscan 3D generator is producing good images. But no easy export to social media 😡
Yes that's it. It took ages to process but it did give a nice 3D image. I couldn’t share it etc
Hoping you'll figure it out. 😉