Captioning and Subtitling

  • With David Kirk

Captioning and Subtitling

Contributing Editor David Kirk looks at the latest developments in broadcast related captioning and subtitling...

The following survey looks at developments in broadcast-related captioning and subtitling over the past few months. Significant trends include increasingly intensive use of automation in speech to text conversion and translation. The rollout of web-hosted services also continues to accelerate. 

AI-Media’s Lexi Tool Kit comprises six modules, each compatible with the company’s Encoder series via the AI-Media iCap Cloud Network:

  • Lexi Live Automatic Captioning: Delivering real-time captions with a claimed accuracy of over 98 per cent. Offering speaker identification and intelligent caption placement, it is claimed to rival that of human captioners at a fraction of the cost.
  • Lexi Recorded: Accelerating captioning for post-production content, this integrates into media asset management systems. Users can caption in over 30 languages, choose from multiple file formats including SRT, VTT, and TXT and make use of the API function and automation.
  • Lexi Translate: This allows users to translate live captions and subtitles to and from over 50 languages with more added every month.
  • Lexi DR (Disaster Recovery): This enables users to host their fully redundant iCap and Lexi servers, ensuring uninterrupted captioning in situations like internet outages.
  • Lexi Local: Designed to deliver secure live, automatic captions, on-premises and off the cloud, Lexi Local is intended for organisations requiring increased security of their content.
  • Lexi Library: This archives captioned content. With customisable permissions and Single Sign On, time-stamped live captions can be accessed securely in real-time or post-session, simplifying the process of transcribing, and distributing captioned sessions.

Digital Nirvana's Metadata-IQ is a web-hosted application for Avid users that automates the process of metadata generation for production, preproduction and live content. It offers on-premises transcoding and intelligent conversion of audio files into text transcripts. Users no longer need to create a low-res proxy or manually import files into Avid MediaCentral. Automated generation and ingest of relevant metadata as locators helps editors identify relevant content accurately. Thirty languages are supported. 

Enco has added new features to its fifth-generation AI-enhanced enCaption automated live captioning and transcription offering for TV and radio, including an updated enTranslate engine and WebDAD 3.0 remote radio automation interface.

“EnCaption5 offers flexible live or file-based captioning and transcription that can be deployed on-premises, in the cloud, or in hybrid on-prem/cloud workflow for customers needing SDI signal connectivity or those requiring a purely on-premises solution,” says Ken Frommert, President of Enco.

“Its containerised architecture enables exceptional scalability and flexibility while providing an easily extensible foundation for subsequent feature expansion. Its open API supports seamless integration with media asset management systems and third-party developers."

OOONA has introduced a new API providing additional functionality to its range of online-accessible support services. The new API is designed to check caption and subtitle files quickly and accurately at the point of ingest, during postproduction and prior to delivery. “We have developed the API in response to requests from clients who want to accelerate the process of content inspection when they get files from vendors or at pre-export,” says Alex Yoffe, OOONA Product Manager. “It performs critical checks and generates a text-based report which can be stored for reference or sent to a client along with the file. This is especially useful in ingest and dispatch suites where time is often at a premium. It also ensures that post-production staff accessing the files are alerted to critical information requiring their attention.”

“The API is currently geared to perform a series of critical checks and verifications which we plan to expand with time,” adds Adam Tal, OOONA Software Architect. The current list includes:

  • Reading speed - Validate reading speed constraints.
  • Profanity - Check for profanity words.
  • Language - Check that the file content and header match the desired language.
  • Minimum duration - Check that all subtitles are above a minimum duration.
  • Maximum duration - Check that all subtitles are below a maximum length limit.
  • Start TC - Check that all subtitles are after the media start timecode.
  • End TC - Check that all subtitles are before the media end timecode.
  • Max lines - Check that all subtitles have no more than the allowed number of lines.
  • Not bottom - Check for all subtitles that are not aligned to the bottom.
  • Not bottom or top - Check for all subtitles that are not aligned to the bottom or top.
  • Inside segments - Check that all subtitles are inside segments.
  • Frames per second - Check that all SMPTE values in the file match the fps and that the header fps value is present.
  • Line length - Validate that all lines are shorter than the specified length.

Most checks are configurable so that a user can specify the minimum/maximum length, maximum number of subtitle lines, profanity words and so on. We will be working through and adding a long list of checks with lower priority that our clients would like. In addition to the above, the API can perform critical checks that run on all files:

  • Check that the file is not corrupt and can be imported.
  • Check that all subtitles are timed.
  • Check that all subtitles have a duration of more than NNN frames (regardless of the ‘Minimum duration’ check).
  • Check for timecode consistency.

OOONA’s Syncheck tool is designed to automatically check subtitle files against videos with corresponding file names when dropped in a folder or manually paired, and then to flag errors, The tool checks if a given subtitle file is in the appropriate language and in sync with the corresponding video by automatically examining a configurable number of check points. It compares the subtitle text against the audio at the specific checkpoints with the help of speech recognition and machine translation technologies, then calculates a fuzzy match score. If this score is lower than a given threshold, the file is failed and tagged with an error flag. Users get the option to open it in OOONA’s Create Tool to fix the errors identified. Create Tool automatically checks for a variety of customszable errors on the basis of presets. It also offers a bird’s eye view of multiple language streams simultaneously which is useful for project managers, plus a user-friendly feature for tracking file revisions. “Quality assurance is a critical aspect of the media industry - it is a natural area of focus for us,” adds Wayne Garb, OOONA co-founder and CEO. “We are committed to providing our customers with innovative solutions that enable them to streamline their operations. Syncheck does just that.”

ToolsOnAir’s just:CC is a real-time Live Closed Captioning solution that generates CEA-608/708 caption data and inserts them into the VANC data stream of SD and HD signals. just:CC allows broadcasters to generate and insert closed captions onto their channel, even if the live signal or pre-made content does not come with CC data. This is done independently of any internet connection. As of version 1.5, just:CC can also be deployed to run in the cloud and used in combination with the AWS speech-to-text engine. Other transcribing engines can be integrated on request. “just:CC delivers a high-quality CC insertion you expect in a professional broadcast environment by using industry-leading speech-to-text recognition technology,” says Peter Steiner, Head of Sales.

Depending on the content, it is able to produce an accuracy of 90% and might perform more accurately than a human captioner in applicable cases. just:CC generates CEA-608/708 captions and 608- embedded-in-708 CC data and can also detect and parse existing CC data. The data is inserted into the output signal using enterprise-grade hardware and is compatible with various external caption embedders. Next to the original SDI format support, version 1.5 adds the TS and ASI formats as well as NDI, SRT, RTMP, RTP and MPEG-TS I/O capabilities. The available frame rates have been extended to include 25 fps and 50 fps in addition to the already existing 29,97 fps and 59,94 fps. just:CC can optionally provide ready-to-run integration with a third-party transmission unit solution providing EBU and DVB Teletext, OP-47, CAVENA P31, SCTE-27, SD/HD In-Vision as well as ASI Output to DVB systems support.

VoiceInteraction’s Audimus.Media is designed to perform live automatic closed captioning without human interaction. "Audimus.Media is an end-to-end captioning solution that contributes with automatic live speech transcription and translation to multiple workflows, enabling a cost-effective and accurate live captioning workflow", says CTO Renato Cassaca.

"Latest product updates include contribution of live captions to SMPTE ST 2110 networks and MPEG-TS multiplexers. Supported by proprietary technology, our software generates high-accuracy captions and delivers them with low-latency delays. Recent updates include mute/unmute commands, GPIO integration, native captioning embedding into HD-SDI, restream HLS with WebVTT, MPEG-TS contribution with ST-2038/RD-11 and clips with synchronized captions. "With an intuitive web dashboard, Audimus.Media allows for customised setup, control over every configured channel, access to specific features, event scheduling and the creation of new live closed captioning tasks. It supports a large array of inputs, including standard capture cards, analog sound cards or virtual devices, as well as streaming feeds. The output is equally streamlined by simplifying the overall process with native delivery integrations, supporting the majority of encoders and streaming APIs. These on-top integrations provide a broader degree of custom vocabulary control, expediting improvement and adaptability to every captioning scenario. Constantly adapting to local pronunciations, idioms and other local speaking characteristics, our language models maintain high-quality standards even in situations such as breaking news situations that demand continuous captioning. Our AI-driven models are refined daily, sourced from the network's ongoing programming and statistical models from the web, instantly adding unusual names and terms relevant for the upcoming news cycle, supported by standard newsroom computer systems like NPS as well as MOS for added integration. The platform identifies speakers and detects transitions to increase caption readability and flow as well as distinguishing multiple spoken languages. This allows live multilingual broadcasting, with automatic translation on top of produced captions, working on premises without an internet connection. As a global company, we provide closed captions for 40 languages, allowing any TV station to comply with the highest accessibility standards with absolute confidence.”