Machine Translation

  • By Adrian Pennington

Machine Translation

With the increased demand for accessible and localized content, automatic speech recognition is essential with word error rates now very low for broadcast material... By Adrian Pennington

Automated Speech Recognition (ASR) systems learn from data to generate closed captioning and subtitling at close to 100 per cent accuracy. This includes the creation of a preliminary transcript, splitting the transcript or script into frame-by-frame captions/subtitles, automatically checking for conformance with styles from publishing platforms, and generating machine translation to make the text localization process much more efficient. 

One variable is the update of vocabulary with new words - ASR systems must have a mechanism to learn these new words. “When we have deviations from these conditions there will be a drop in quality depending of the deviation (e.g. background noise, overlap speech, several people speaking at the same time, music),” says VoiceInteraction, Marketing Manager Mariana Manteiga.

“Speech recognition has come a long way and word error rates are now very low for broadcast material,” agrees Wayne Garb, CEO and Co-Founder, OOONA. “We’ve seen an increase in the demand for this technology to facilitate the creation of subtitle and caption files in the same language as the audio, so have integrated a few state-of-the-art engines into our Create Pro editor to cater for client demand. Of course, the automated output isn’t perfect, so you still need a professional to fix the errors, but it is good enough to provide significant productivity increases.” 

There are two additions the company has recently made as part of its effort to support the media localization market - both claimed unique to our sector. Explains Garb, “The first is our EDU platform, which is tailored to the needs of academic institutions and was built with input from world-renowned academics (including Jorge Díaz-Cintas, Professor of Translation at the Centre for Translation Studies, University College London). All our tools are browser-based, which makes them ideal for remote training scenarios and compatible with any operating system students have access to. 

“Aside from facilitating the training of new professionals, we also created The POOOL, an online directory for such professionals to connect to the hundreds of LSPs in the market.

“We wanted this to be an inclusive initiative to everyone’s benefit, so The POOOL is run by key industry stakeholders and academics and we are in the process of inviting more key players to join our community.”

The company has also embarked on a claimed first-of-a-kind proof of concept to add high-in-demand translation-assisted tools in its subtitle editor, enabling translators to make the most out of their work and boost their productivity.

ASR is on-par with human captioning in many respects. Today, ASR is being used to provide captioning in both live and file-based situations. Accuracy varies based on speech engine, genre, language and audio quality.

“No system is hundred per cent accurate but the technology is vastly improved over the last few years,” says Chuck Jones, CMO, BroadStream. Editing file-based captions must be done manually to bring them up to hundred percent.

BroadStream offers two solutions: VoCaption for live broadcast and SubCaptioner for file-based captioning.

Explains Jones, “Both include easy-to-update, custom dictionaries to ensure local names, locations and terms or phrases are accurate and also provide word replacements for unacceptable language. VoCaption can be used stand-alone or integrated with Polistream or OASYS Integrated Playout for a fully integrated, ‘one-box’ solution.

“SubCaptioner provides file-based captioning and does the majority of the transcribing work humans have typically done. It delivers an SRT file in just minutes and we provide scoring to easily locate areas we think may need editing. SubCaptioner will be available in both cloud and on-premise versions. Both solutions are cost-effective and easy to use.”

ASR and neural machine translation, when properly trained in the source material domain, create significant efficiencies inside captioning and subtitling workflows. This is further improved with custom features such as AppTek’s Intelligent Line Segmentation. This produces line breaks in subtitles closer to how a professional would create them, on the basis of syntax and semantics, for better readability.

“However, the industry will need to rely on manual human-in-the-loop editing, not only to guarantee hundred percent accuracy, but also for the more creative part of the art of subtitling itself,” says Kyle Maddock, SVP Marketing. 

“Successfully translating jokes and puns or nailing the perfect timing of when a subtitle is placed on screen to match the comedic timing of an actor’s delivery can only be done by skilled talent.”

AppTek will be launching a new machine translation technology that makes use of the multi-dimensional metadata to offer deeper customization, and hence better quality output, at the project level, document level, or even at the individual sentence level. Users will have more control to obtain their desired output with an ability to “toggle” output results on the basis of style (formal or informal), gender, domain, topic, length, language variety, and more.

“The increase in demand for accessible and localized content has not translated into increased budgets for these services,” reports Louise Tapia, CEO, Take 1. “Timelines have also become tighter now that content is launched globally and on multiple platforms at the same time.  While the industry recognises the importance of high-quality captions and subtitles to optimise viewing experiences, quality is often sacrificed in the face of these ever-reducing budgets and turnaround times.”

Take 1 believe it’s possible to provide “artisan services at scale without breaking your budget”.  One of the ways it does this is by producing transcripts, Post Production Scripts and As Broadcast Scripts as interchangeable data that can be repurposed into the various documents, files and reports needed throughout the global content production workflow.

“The time saved by repurposing this data throughout the content supply chain can then be spent focussing on the quality of the captions and subtitles we produce.  Our cloud platforms, automation tools and strategic use of AI create further workflow efficiencies to support a more creative process.”

VoiceInteraction are releasing a new version of its Automatic Closed Captioning Software, Audimus.Media, that will include the ability to translate for a new language. As an example, clients in the US with English and Spanish audiences can provide all their programming with CC in both languages using the product -all with high accuracy and low latency, says Manteiga.

“This means that live translation, accompanied by automatic punctuation, capitalisation, and denormalization is now a reality. It is the ideal solution for unanticipated situations, such as ‘Breaking News’ scenarios that still demand continuous live closed captioning. Audimus.Media will be able to assist any TV station with full media coverage while providing overall accessibility for the viewership.”

Trance is Digital Nirvana’s transcription, captioning, and text localization product. It is equipped with the ability to generate speech-to-text in 30 different languages or ingest an existing script. The system will automatically identify the best engine from a cluster to provide the best output possible.

Explains Russell Wise, Vice President, Sales & Marketing, “The transcript editor window of the application provides the option to highlight low-confidence words from the speech-to-text engine making it easier for the user to make corrections. The parameter and NLP-based presets enable intelligent splitting of transcripts or scripts into closed captions/subtitles, which is presented in a user-friendly window for further review by the user.

Users can use the application for text localization of captions created from within Trance or ingest an existing caption file and generate machine translation in 100 different languages. The text localization window will display the source video, source language captions, as well as the Machine Translation generated for easy review and corrections.”

The company also has a fully functional transcription, captioning/subtitling, and text localization service department. It has been catering to many enterprise customers over the past two decades.

“Resubmission and rejections from the final publisher of content is proving to be costly now and so would require special attention to the finished captions.

“Today technology has allowed us to automatically review most of the technical and operational parameters, compare it with guidelines provided by content publishing platforms, and alert for any non-compliances. Digital Nirvana’s captioning/subtitling services have seen zero rejections/resubmissions since implementing this functionality to its internal workflow in May 2020.”

“Providing efficient, accurate captions can be a huge challenge when it comes to making video content accessible and one that requires highly specialist skills which in turn is expensive and time-consuming,” says Georgina Robertson, Head of Brand and Communications, Speechmatics. “The volume of online content that needs to comply with FCC regulations has also increased in the past 18 months so broadcasters and content providers need a quick and easy way of adding captions to their content.” 

Speechmatics’ ASR engine can be used in real-time or on pre-recorded video files, helping to improve the accessibility of content, as well as enhancing search and archive possibilities for live TV content. “Our ASR engine is the first step to providing accurate, seamless captioning to the masses and ensuring audiences benefit from premium video captioning whether its broadcast or online,” she says.