Closed Captioning Update

  • With David Kirk

Closed Captioning Update

Contributing Editor David Kirk has a look at the latest developments for closed captioning...

Closed captioning as a broadcast technology recently celebrated its half centenary. It was first applied in 1972 by the American Broadcast Company to support hearing-impaired television audiences. Real-time captioning of live broadcasts followed in 1982, powered by reporters trained to transcribe at speeds up to and exceeding 225 words per minute. 

Advances in speech recognition technology over the subsequent decades have allowed live captioning to be partly or even fully automated, typically using a trained 're-speaker' who paraphrases the running commentary for input to the automated text generation system. Like OCR technology, this works well most of the time but is ideal territory for AI programmers until they too are replaced by AI-based processors. 

Meanwhile, the transition from software-based products to web-based services continues to blur the divide between manufacturers and customers. This update looks at new developments in the closed captioning category since the subject was last covered by InBroadcast in January 2022.

Ai-Media describes its iCap Alta encoder as the answer for any broadcaster looking to leverage the benefits of captioning in IP video environments. Running on API-powered software, it provides a resilient workflow for live multilingual captions across compressed and uncompressed IP video. iCap Alta is designed to help broadcasters transition from hardware-driven SDI to software-controlled IP video workflow. It is configured for processing either MPEG Transport Streams, SMPTE 2110 or CDI uncompressed IP video. SRT protocol is supported to enhance performance when streaming over unpredictable networks like the public internet. iCap Alta seamlessly integrates with Ai-Media’s Lexi automatic live captioning solution, which offers high accuracy and reliability at a compelling price point. Ai-Media's Lexi is claimed to achieve up to 98% accuracy using its Topic Models feature which recognises unique phrases and vocabulary, as well as observing context. Lexi can be used on a SaaS basis to perform fully automated captioning. Ai-Media’s hybrid Smart Lexi solution combines automation with human curation. Lexi and Smart Lexi are also available across pre-recorded or VOD content with an API workflow. Combined with iCap Translate, they can provide multilingual captions in a variety of languages

AppTek has cooperated with Washington DC based Gallaudet University to develop GoVoBo, an automatic captioning tool created by and for deaf and hard of hearing users. The GoVoBo application offers the ability to use any online meeting service without the need to configure caption services individually for each meeting. GoVoBo automatically transcribes spoken content in real time, using AI-enabled speech recognition technology.

At the end of each session, users have an editable and exportable transcript so they can focus on the meeting attendees and interaction, rather than looking away to take notes. "Gallaudet has deep ties to the user community and a deep understanding of the scientific research behind conversational AI," comments AppTek CEO Mudar Yaghi.

Digital Nirvana's Metadata-IQ is a web-hosted application for Avid users that automates the process of metadata generation for production, preproduction and live content. It offers on-premises transcoding and intelligent conversion of audio files into text transcripts. Users no longer need to create a low-res proxy or manually import files into Avid MediaCentral. Automated generation and ingest of relevant metadata as locators helps editors identify relevant content accurately. Some 30 languages are supported.

ENCO's enCaption automatically generates captioning for video. Content creators can provide real-time live captioning to hard-of-hearing viewers any time, without any advance notice and without the cost of live captioners or signers, or any spoken word, even radio and podcasts.

When content creators add automated real-time live captioning, they broaden their audience to the deaf or hard-of-hearing viewers, and to people viewing from noisy environments where the audio can't be heard, without any advance notice and without the cost of live captioners or signers.

The latest generation of enCaption also offers an offline file based mode for automatically captioning pre-recorded content. Audio or video files can be ingested into the system and immediately output in multiple captioning file formats or forwarded for optional human review. Linking enCaption to an electronic newsroom system allows it to automatically access current and historical script information to build a local dictionary. Custom packaging can be configured to recognise a station's specific set of uniquely spelled names, jargon or other special words. Separate packages can be configured for different programmes. Automated captioning can also be added to NDI workflows. Once connected, enCaption will automatically output an NDI signal with captions keyed directly on top of the video stream. All of our products are backed with 24/7 live emergency support and a comprehensive online support portal and knowledge base.

"Our technical support team boasts over 200 years of combined engineering experience in a wide range of broadcast professions including program directors, engineers, IT managers, and even on-air talent" adds ENCO Technical Support Manager Wes Meisnitzer.

OOONA reports that automation, integrations and audio have been its main areas of focus for during the past 12 months. "2022 has seen many language service providers turn to OOONA Integrated as their platform of choice in order to best manage security and complexity and be in a position to scale," says Wayne Garb, OOONA co-founder and CEO. "Others opt to integrate OOONA Tools in their proprietary platforms, or simply license them standalone, as they understand the value of the cloud. One of the most popular integrations continues to be the OOONA Convert API which has been updated with more formats and features and has gone live with several new big clients.

Highlight of the year for us was a new online audio description editor. With audio description volumes growing, companies have been looking for tools that facilitate the creation of audio description scripts as well as their voicing. OOONA’s AD editor provides scripting functionality in an intuitive manner similar to the rest of the OOONA Tools. It also allows users to voice their own scripts or use a stock or custom synthetic voice provided by Veritone to do so. With project turnround times becoming shorter than ever before, all media localization providers are looking to further automate and streamline their production workflows.

Investing in platforms like OOONA Integrated makes this possible by taking advantage of all the inbuilt functionality the platform offers to support the work of a distributed workforce and facilitate collaboration in a secure environment. With the recent adoption of Veritone Voices, we were able to unlock more value for OOONA clients seeking optimal solutions for their subtitling, dubbing and audio description needs. Veritone provides a large range of synthetic voices in many languages, and we empower users with the necessary controls to fine-tune the output to their liking."

Take 1 specialises in transcription, access and localization services for the media and entertainment sector. The company was recently acquired by Verbit, a leading provider of voice AI transcription and captioning solutions. “We’re thrilled to be joining the Verbit family,” says Louise Tapia, CEO of Take 1. 

“With the support of Verbit’s international footprint, proven capabilities and world-leading technologies, we will be able to expand and enhance our services to provide clients with a unique end-to-end offering.”

For Take 1, being part of the Verbit group has also allowed them to partner with sister company VITAC, also previously acquired by the unicorn company. VITAC is the largest provider of closed captioning and accessible communications in North America, and their partnership means customers have all the benefit of a global group and expanded services, while still benefiting from Take 1’s personal approach.

Telestream's Stanza captioning and subtitling software offers subscription-based captioning facilities. Editors can operate from any location via a browser-based user interface regardless of where media files are stored. Stanza uses the Telestream GLIM engine to play back original high-res files instantly without any need to create and transfer proxy images. "With Stanza, large media files do not need to be downloaded or moved across a local area network, nor do proxies need to be created to work on captioning projects," says Scott Matics, Senior Director of Product Management at Telestream. "Able to run on Windows, MacOS and Linux, it includes optional access to the AI powered Timed Text Speech auto-transcription service which supports over 100 languages. Stanza also integrates with the Vantage Timed Text Flip text transcoder and processor to provide automation for captioning workflows. Companies of any size, as well as independent transcribers, can now create and edit broadcast-grade, high-quality captions and subtitles more efficiently and more cost-effectively than ever before. This represents the future of closed captioning and subtitling workflows."

Tools on Air's 'just:CC' is an AI based realtime closed caption generator and inserter. It generates CEA608/708 captions and 608-embedded-in-708 closed caption data and is also able to detect and pass already existing CC data. Once the data is generated it is inserted into the output signal using off-the-shelf hardware. Integration with hardware from AJA, Apple and Blackmagic Design allows generation and insert of data via a single workstation. System requirements are Apple Mac Pro, iMac, iMac Pro or Mac mini and Apple macOS Catalina or later.

VoiceInteraction's Audimus.Media is designed to perform live automatic closed captioning without human interaction. "Audimus.Media is an end-to-end captioning solution that contributes with automatic live speech transcription and translation to multiple workflows, enabling a cost-effective and accurate live captioning workflow", says CTO Renato Cassaca, "Latest product updates include contribution of live captions to SMPTE ST 2110 networks and MPEG-TS multiplexers. Supported by proprietary technology, our software generates high-accuracy captions and delivers them with low-latency delays. It was given a Best of Show award at the April 2022 NAB Show in Las Vegas and at September's IBC2022 for its recent updates, particularly in terms of expanded supported inputs and additional encoders. Recent updates include mute/unmute commands, GPIO integration, native captioning embedding into HD-SDI, restream HLS with WebVTT, MPEG-TS contribution with ST-2038/RD-11 and clips with synchronized captions. "With an intuitive web dashboard, Audimus.Media allows for customised setup, control over every configured channel, access to specific features, event scheduling and the creation of new live closed captioning tasks. It supports a large array of inputs, including standard capture cards, analog sound cards or virtual devices, as well as streaming feeds. The output is equally streamlined by simplifying the overall process with native delivery integrations, supporting the majority of encoders and streaming APIs. These on-top integrations provide a broader degree of custom vocabulary control, expediting improvement and adaptability to every captioning scenario. Constantly adapting to local pronunciations, idioms and other local speaking characteristics, our language models maintain high-quality standards even in situations such as breaking news situations that demand continuous captioning. Our AI-driven models are refined daily, sourced from the network's ongoing programming and statistical models from the web, instantly adding unusual names and terms relevant for the upcoming news cycle, supported by standard newsroom computer systems like NPS as well as MOS for added integration. The platform identifies speakers and detects transitions to increase caption readability and flow as well as distinguishing multiple spoken languages. This allows live multilingual broadcasting,with automatic translation on top of produced captions, working on premises without an internet connection. As a global company, we provide automatic closed captions for more than 35 languages, allowing any TV station to comply with the highest accessibility standards with absolute confidence."

Wowza's Streaming Engine software integrates closed captioning into streaming video, making online content accessible to a broader audience. It can ingest caption data from a variety of in-stream and file-based sources and convert it to the appropriate format. The company's supporting Streaming Cloud service offers closed captioning for live streaming video delivered via a global cloud platform. Wowza Streaming Engine accepts CEA-608 captions embedded in live streams, as Action Message Format (AMF) onTextData in input streams, or embedded as AMF onCaption and onCaptionInfo in input streams. Using a Java API, it can also connect to back-end caption providers. For outbound video, Wowza Streaming Engine directly embeds CEA-608 or WebVTT captions in live streams played back via Apple HLS. For on-demand content, Wowza Streaming Engine can convert and deliver captions that are embedded in the video content as well as captions that are delivered in external files alongside the video content.