Google's MusicLM generates music from text – and humming


Google Research is testing an AI model that can generate music from sounds like whistling and humming in addition to text prompts such as "Berlin 90s techno with a low bass and strong kick."

A paper description of MusicLM claims it can generate "high fidelity" music at 24 kHz that "remains consistent over several minutes." The model "outperforms previous systems both in audio quality and adherence to the text description," it says.

In addition to text prompts, MusicLM can be conditioned to generate music based on an additional melody, such as whistling or humming.

ADVERTISEMENT

"Since describing some aspects of music with words can be difficult or even impossible, we show how our method supports conditioning signals beyond text," the paper published in arXiv read.

The paper provides AI-generated music examples based on text prompts ranging in detail and richness. While it can generate music based on a short prompt such as "relaxing jazz," the model can also be fed elaborate and specific descriptions.

One example provided by researchers read: "A fusion of reggaeton and electronic dance music, with a spacey, otherworldly sound. Induces the experience of being lost in space, and the music would be designed to evoke a sense of wonder and awe, while being danceable."

Another experiment included prompting the model to create music based on descriptions of famous paintings, such as The Scream by Edvard Munch. The eerie tune generated by AI matched the "hallucinatory" experience of the image.

"This is bigger than ChatGPT to me. Google almost solved music generation," AI scientist Keunwoo Choi tweeted, adding he found the model "really, really brilliant."

MusicLM can be conditioned to generate music based on instruments, places, epochs, genres, and music experience levels. There is also a "story mode" to program time-specific shifts in style, atmosphere, and tempo.

Researchers have released the MusicCaps dataset with over 5,500 music-text pairs prepared by musicians for public use to "support further research."

ADVERTISEMENT

They have also acknowledged the risks associated with music generation, namely the misappropriation of creative content.

"We strongly emphasize the need for more future work in tackling these risks associated with music generation – we have no plans to release models at this point," the paper read.