MusicLM is Google's new AI model capable of generating rich music samples based on text prompts and melody input. But it won't be releasing to the public yet.
Great post! The text prompts seemed like they certainly have a strong degree of "interpretation" (just like image AI) but I found the melody conditioning particularly fascinating. The examples had a wide array of outputs but the melody itself sounded pretty spot on with each one.
It's easy to imagine models like this being incorporated into gaming. Variables like health status or combat status etc. could easily be fed into the soundtrack on the fly to make things more tense for example.
Yeah I like the potential this has for even the average person to "compose" entire musical arrangements based on some simple tune they have in their heads. You just need to hum it passably and then ask for the genre and instruments you want. You could technically even hum each separate instrumental track and mix them into a complete piece.
And the gaming use case sounds really cool, I didn't even think about that option! Could also work in an immersive cooperative game where players have to communicate via voice chat - music/atmosoheric sounds could respond to their voice intensity or even the content, etc.
Great post! The text prompts seemed like they certainly have a strong degree of "interpretation" (just like image AI) but I found the melody conditioning particularly fascinating. The examples had a wide array of outputs but the melody itself sounded pretty spot on with each one.
It's easy to imagine models like this being incorporated into gaming. Variables like health status or combat status etc. could easily be fed into the soundtrack on the fly to make things more tense for example.
Yeah I like the potential this has for even the average person to "compose" entire musical arrangements based on some simple tune they have in their heads. You just need to hum it passably and then ask for the genre and instruments you want. You could technically even hum each separate instrumental track and mix them into a complete piece.
And the gaming use case sounds really cool, I didn't even think about that option! Could also work in an immersive cooperative game where players have to communicate via voice chat - music/atmosoheric sounds could respond to their voice intensity or even the content, etc.
Crazy times we live in. Thanks for the comment!