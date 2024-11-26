NVIDIA has unveiled an experimental AI model, Fugatto, capable of generating audio from text prompts and modifying existing sound files. Officially named the Foundational Generative Audio Transformer Opus 1, the model is designed to provide a versatile solution for sound creation, described by NVIDIA as “a Swiss Army knife for sound.” Built by an international team of AI researchers, Fugatto’s capabilities extend across multiple languages and accents, enhanced by the diversity of its developers.

According to Rafael Valle, NVIDIA’s manager of applied audio research, the goal was to develop a model that approaches sound generation with human-like understanding. Fugatto enables applications ranging from rapid music prototyping to creating personalized language learning tools and dynamic audio assets for video games. For example, music producers can use the model to experiment with different voices, instruments, and styles, while game developers might customize in-game soundscapes to reflect player decisions.

NVIDIA’s Fugatto represents an exciting leap forward in generative AI, with its ability to craft complex, dynamic audio. While its practical applications remain to be tested at scale, the technology holds immense promise for creative industries, blending technical sophistication with artistic possibilities.

Beyond these use cases, the researchers discovered Fugatto could handle tasks outside its training scope. With minimal fine-tuning, the model can combine separate training instructions, such as generating emotionally expressive speech in specific accents or blending natural sounds like birdsong with the dynamic intensity of a thunderstorm. It can also produce audio that evolves over time, such as rainstorms traversing landscapes.

Despite its advanced capabilities, NVIDIA has yet to announce plans for public access to Fugatto. This development follows similar initiatives from tech giants like Meta, which introduced an open-source AI for sound creation, and Google, whose MusicLM tool generates music from text prompts via its AI Test Kitchen.

Featured image courtesy of Ars Technica

Follow us for more updates on Nvidia’s new AI model.