
OpenAI announced on Thursday a set of new voice intelligence features for its API, expanding its real-time audio capabilities with tools designed for conversational AI, live translation, and speech transcription.
The company said the additions are intended to help developers build applications that can speak with users, transcribe conversations as they happen, and provide real-time multilingual communication.
The update introduces three new models: GPT-Realtime-2, GPT-Realtime-Translate, and GPT-Realtime-Whisper.
New Voice Model Adds GPT-5-Level Reasoning
GPT-Realtime-2 is OpenAI’s latest conversational voice model and succeeds GPT-Realtime-1.5.
According to the company, the model combines realistic voice simulation with GPT-5-class reasoning capabilities, allowing it to handle more complex user requests during spoken interactions.
The system is designed to move beyond basic voice responses and support more advanced conversational workflows.
OpenAI said the broader goal is to create voice interfaces capable of listening, reasoning, translating, transcribing, and acting during live conversations.
“Together, the models we are launching move real-time audio from simple call-and-response toward voice interfaces that can actually do work,” the company said.
Translation And Transcription Features Added
The company also introduced GPT-Realtime-Translate, a live translation system designed to translate spoken conversations in real time.
The model supports more than 70 input languages and 13 output languages.
OpenAI said the translation system is designed to maintain conversational pacing so translated interactions continue naturally while users speak.
A third model, GPT-Realtime-Whisper, provides live speech-to-text transcription capabilities.
The feature captures spoken dialogue as conversations occur and converts it into text in real time.
Enterprise And Customer Service Use Cases Targeted
OpenAI said the new voice tools could support applications across customer service, education, media, live events, and creator-focused platforms.
The company identified enterprise support systems as one of the primary intended use cases for the new models.
At the same time, OpenAI acknowledged the potential risks tied to advanced voice technology, including misuse involving spam, fraud, or deceptive communications.
The company said it implemented safeguards intended to detect and stop abusive behavior.
According to OpenAI, the systems include internal triggers capable of halting conversations if interactions violate the company’s harmful content policies.
All three models are available through OpenAI’s Realtime API.
GPT-Realtime-Translate and GPT-Realtime-Whisper are priced based on minutes of usage, while GPT-Realtime-2 uses token-based billing.
Featured image credits: Flickr
For more stories like it, click the +Follow button at the top of this page to follow us.
