Translation Made Seamless
“Today, we’re launching SeamlessM4T, a multimodal AI that facilitates cross-linguistic communication”, announced Meta’s CEO, Mark Zuckerberg, via his Instagram account.
This innovative AI model can perform text-to-speech, speech-to-text, speech-to-speech, text-to-text conversions, and speech recognition.
Plans are in motion to integrate this game-changing technology into Meta’s social platform suite, including Facebook, Instagram, WhatsApp, Messenger, and Threads.
Bridging the Language Gap Through AI
In line with our commitment to open science, “we’re releasing SeamlessM4T under a research license, allowing researchers and developers to expand upon the work. We’re also releasing SeamlessAlign’s metadata, which is currently the largest open multimodal translation dataset, comprising 270,000 hours of text and speech alignments,” detailed the company.
SeamlessM4T is a culmination of the advancements in the tech space, drawing from prior projects such as the No Language Left Behind (NLLB) initiative.
Launched last year, NLLB is a text-to-text translation model that supports 200 languages. This transformational model has since been integrated as a translation provider into Wikipedia.
Massively Multilingual Speech Support
Meta demonstrated a Universal Speech Translator, the first-of-its-kind direct speech-to-speech translation system for Hokkien, a language devoid of a widely-used writing system.
Another innovation shared by the company is that of Massively Multilingual Speech. Besides identifying the language, this system delivers speech recognition and synthesis technology across an impressive 1,100 languages.
One point to note here is that although the model can perform speech recognition in many languages, its capability to execute text-to-speech translation is somewhat limited.