At OpenAI’s recent DevDay event, several significant updates were announced, including the introduction of a Real-Time API. This API is designed to facilitate low-latency, AI-generated voice responses, enabling nearly real-time speech-to-speech experiences in applications
The Real-Time API uses WebSockets for implementing voice input and output, allowing developers to create applications similar to ChatGPT’s advanced voice mode
The Real-Time API is currently in public beta and offers six distinct voices from OpenAI, although it does not support third-party voices due to copyright considerations2. It is priced at $0.06 per minute for audio input and $0.24 per minute for audio output2. Additionally, the API supports function calling, which allows developers to invoke actions based on spoken commands, enhancing interactivity in applications