DMR News

Advancing Digital Conversations

ElevenLabs’ New AI-Powered Text-to-Sound Feature

ByHilary Ong

Feb 20, 2024

ElevenLabs’ New AI-Powered Text-to-Sound Feature

ElevenLabs, a pioneering AI startup established by former employees of tech giants Google and Palantir, has recently announced its plans to broaden its technological offerings. The company, known for its mastery in machine learning (ML)-based voice cloning and synthesis, is set to introduce a novel text-to-sound model. This cutting-edge innovation promises to revolutionize the way sound effects are created, enabling creators to bring their auditory visions to life by merely describing them in words.

The announcement, made just a few hours ago, has stirred considerable excitement within the digital content creation community. ElevenLabs’ upcoming AI tool aims to significantly enhance the richness of content across various platforms, marking a new era in AI-driven digital experiences. Although the model is not yet available for public use, the company has tantalized potential users by releasing a minute-long teaser. This preview showcases the capabilities of the new model through videos produced by OpenAI’s latest Sora model, augmented with ElevenLabs’ AI-generated sounds, thereby offering a glimpse into the future of digital content creation.

Bridging the Gap in AI-Generated Content

Founded in 2022, ElevenLabs has dedicated itself to the research and development of AI technologies with the goal of making audio and video content more accessible worldwide. The startup has introduced an array of innovative products, including text-to-speech and speech-to-speech models. These models can convert content from text, audio, or video formats into AI-generated speech in 29 different languages, replicating natural voices and emotions effectively. The widespread adoption of these tools by both individuals and enterprises underscores the growing demand for advanced AI-generated content solutions.

The emergence of tools like Runway, Pika, and OpenAI’s Sora, which generate realistic AI videos from simple text prompts, has highlighted a gap in the market: the absence of default audio in AI-generated videos. ElevenLabs’ text-to-sound model aims to fill this gap, enabling users to effortlessly add natural-sounding background noises to their creations. Whether it’s the sound of chirping birds, bustling city streets, or conversational murmurs, this new model promises to add an extra layer of realism to digital content.

Elevating AI-Generated Content with Realistic Sound Effects

Luke Harries, the head of growth at ElevenLabs, expressed the company’s excitement about the new product line in a statement shared on social media. Harries emphasized the potential of the model to complement OpenAI’s Sora videos, which, while visually impressive, lacked sound. This initiative by ElevenLabs not only showcases its commitment to innovation but also hints at the vast possibilities for enhancing AI-generated content.

Interested users are encouraged to sign up for early access to the model through a dedicated page set up by ElevenLabs. By registering with their name and email and providing a brief description of their sound effect needs, users can join a waitlist for early access. Although the launch date of the model remains undisclosed, this early access program offers a unique opportunity for creators to be among the first to explore the potential of text-to-sound technology.

The Future of AI in Sound Production

The introduction of text-to-sound technology by ElevenLabs positions the company as a potential leader in this nascent field. However, it’s important to recognize the competitive landscape, with several companies in the AI speech domain poised to enter this market.

Industry analysts predict a bright future for such innovative tools, with the global market for AI-driven content creation tools expected to surge from $1.2 billion in 2022 to nearly $5 billion by 2032, exhibiting a compound annual growth rate (CAGR) of just over 15.40%.


Related News:


Featured Image courtesy of sdx15/Shutterstock

Hilary Ong

Hello, from one tech geek to another. Not your beloved TechCrunch writer, but a writer with an avid interest in the fast-paced tech scenes and all the latest tech mojo. I bring with me a unique take towards tech with a honed applied psychology perspective to make tech news digestible. In other words, I deliver tech news that is easy to read.