DeepL Unveils DeepL Voice for Real-Time, Text-Based Translation from Audio and Video

author
By Tanu Chahal

13/11/2024

cover image for the blog

DeepL, the German startup known for its highly accurate online text translations, has launched a new feature: DeepL Voice. This tool allows users to translate spoken language in real-time by converting audio into text in another language. The feature adds to DeepL’s growing translation suite, which has earned it a $2 billion valuation and over 100,000 subscribers, positioning the company as a top competitor in the AI translation market.

DeepL Voice currently supports live translations in languages like English, German, Japanese, Korean, Swedish, Dutch, French, Turkish, Polish, Portuguese, Russian, Spanish, and Italian. Captions can be translated into any of the 33 languages that the DeepL Translator supports, making the feature versatile for various global audiences.

DeepL Voice’s focus is on live conversations and video conferencing, providing translated text rather than audio output. For example, users in meetings can set up a smartphone between them to display real-time text translations for each participant. For video calls, translations appear as subtitles, a feature currently compatible only with Microsoft Teams, covering many of DeepL's business clients. The company has not yet announced compatibility with other platforms like Zoom or Google Meet.

DeepL CEO Jarek Kutylowski suggested that real-time voice translation is a growing area in the AI field, and DeepL intends to continue expanding in this direction. This first voice-related feature is just one step in the company's larger strategy, as the demand for voice translation has been rising since DeepL’s launch in 2017.

Competing tech companies are also advancing in this area. Google, for instance, has incorporated real-time translated captions in its Meet platform, and specialized AI companies like ElevenLabs are developing voice translation tools. ElevenLabs even uses DeepL’s technology as part of its services, reflecting the high regard for DeepL's translation quality.

To meet industry demands for fast and effective translation, DeepL has developed its own AI language model, specifically optimized for translation. Released in July, this model outperforms other large language models in translation accuracy and speed, making it ideal for real-time use cases like DeepL Voice. Unlike many AI products that rely on third-party models, DeepL’s translation tools are built from the ground up.

While DeepL Voice has great potential for uses in video conferencing and customer service, some privacy concerns remain. Voice data is transmitted to DeepL’s servers for processing, but the company assures users that it does not store or use this data for model training. DeepL plans to work closely with clients to ensure compliance with GDPR and other data privacy laws, prioritizing transparency and user protection.

This new feature is expected to benefit various industries, especially those relying on customer service, where front-line employees can use it to bridge language barriers with clients. As DeepL continues to refine its technology, the company anticipates further developments in real-time, audio-based translation to meet evolving user needs.