Top Four Reasons Why Language Has Changed Forever

Kivo Daily Contributor

Large Language Models and AI are creating a revolution in live language localization. Universal Translator is already here. Zidane Afridi explores the reasons driving this change.

I have been a linguaphile as long as I can remember. Having parents that are from Pakistan and Palestine, I grew up in a household where Urdu, Arabic and English were spoken frequently. Back then, I never imagined that there would be a time when machines could not only translate literal meaning, but also capture the nuanced things such as jokes, poetry, and double-meaning puns. Welcome to Large Language Models, Neural Networks, and new companies pushing the boundaries. In this context, a few companies caught my eye, most notably, SyncWords, a New York based Video Streaming and AI company, and Speechmatics, a Cambridge, UK based company that specializes in speech technology, and automatic speech recognition.

While growing up the philosophy of language captivated my attention. However, a more recent interest of mine has been the advent of Artificial Intelligence in language translation or localization, where the ultimate goal has been to transfer meaning, intentionality, reference, concepts, learning, and thought across languages. Even more — doing all this in real-time! Here are the top four compelling reasons why language will never be the same again.

No More Clunky Experiences

Top Four Reasons Why Language Has Changed Forever

Photo Courtesy: SyncWords

Translating with interpreters has always been a painful process, and painfully expensive too. But AI-based live language translations changed the game. The level of low latency in AI-based live language translations has gone even beyond human level speed to real-time here and now. For example, Speechmatics, a UK based startup, recently lowered their latency to 700 milliseconds for speech-to-text ASR, or Automatic Speech Recognition. SyncWords leverages this low latency from AI engines to produce highly accurate captions, subtitles and even voice translations (aka audio dubbing), in real time. I was amazed to see SyncWords’ AI voice dubs in action: the original speaker can speak in English, and you can simultaneously “hear” his/her audio translation in over 100 languages. Fascinating, to say the least!

Furthermore, I attended a rooftop event in New York organized by Speechmatics in July 2024 where they showed a demo of Flow – a cool voice agent which combines Speechmatics’ real-time ASR with large language models (LLMs) and text-to-speech capabilities. It blew me away when I saw the Speechmatics’ staff having human-level intelligent conversations with Flow!

Multi-Language Synchronicity

Photo Courtesy: SyncWords

You can embed languages in Live video streams in real time to sync up with whatever live stream you need translated. Growing at roughly 20% per year, the video streaming market is forecasted to reach $30 billion by 2030. SyncWords – a New York-based startup, founded by Ash Shah, Sam Cartsos, and Aleksandr Dubinsky – is pioneering live translations in real time video streams. Having developed a unique patented platform, which enables the ingest of live video streams, and the simultaneous generation of subtitles and voice translations aka dubs in over 100 languages – all synchronized with the live video feed. Whether it’s live sports, gaming, news, religious events, SyncWords is shattering the language barrier for good.

Text and Voice Are Finally One

Photo Courtesy: SyncWords

You can have both text and voice translations in real time. Prior translation technology was mostly file-based, serving static pre-recorded video file formats like mp4, mp3 and mov. You had to upload a file, and then wait for the AI to translate it. Real-time translation has evolved and now works with live streaming protocols such as HLS, SRT and RTMP(S). With the ability to generate both text (as subtitles) and voice translations (dubs) at the same time practically, SyncWords has the ability to embed the subtitle and dub outputs into the live streams in sync with the live video, thus enhancing the overall viewer experience.

Massive Upgrade in Language Quality

The quality of the language translations are better than they have ever been. AI-powered translation engines like DeepL have significantly improved the accuracy of translations, while LLMs like ChatGPT have won out in terms of capturing brand nuances and enhancing overall quality. These technologies make translations sound more natural, capturing subtle nuances, and ensuring cultural relevance. Whether you are watching a foreign show, or an unlocalized anime, the ability to add subtitles and dubs translations to your media is immaculate. Pair that with the ability to do all of this to live media in real time, and we are witnessing a true revolution in language localization.

In conclusion, the Universal Translator in Star Trek, or TARDIS from Doctor Who, and Babel Fish in The Hitchhiker’s Guide to the Galaxy, have emerged from fiction into reality aided by mind-blowing AI that currently exists.

About the Author: Zidane Afridi is a senior at Manhattan Hunter Science High School, in New York City. He has been experimenting with AI since 2021. Besides AI, he’s also into video games, art and fitness.

Published by: Nelly Chavez