The rise of audio signal processing

  • By Kevin Hilton

The rise of audio signal processing

From its beginnings as primarily a programme making tool, audio signal processing has developed to encompass loudness and Next Generation Audio. And as contributing editor Kevin Hilton writes, this area of technology looks likely to grow in importance...

Audio signal processing was at one time heavily associated with specific, individual functions such as equalisation and limiting. While these are still carried out today using standalone hardware devices, since the advent of digital signal processing (DSP), they have also been incorporated into more comprehensive systems offering a broad range of features, including time alignment and channel matching.

In television broadcasting, audio signal processing is now not only necessary during programme production but also along the distribution and transmission chains. Consequently, systems have had to become more powerful and sophisticated, particularly with the explosion in the number of TV channels and the rise of streaming and video on demand (VoD). “The TV broadcast market is looking for flexibility, density and efficiency in audio signal processing,” comments Cris Garcia, Manager of Sales for Western USA and Latin America at Cobalt Digital. “Over the years we have seen an evolution from discrete to embedded audio with the ability to re-route, map and shuffle multiple audio channels.”

With what Garcia calls today’s “ever-growing diversity of applications”, there is now even greater need for processing systems to be not only flexible but also robust. This is particularly true where legacy audio formats - including AES3 (the standard for exchanging sound signals between professional audio units) and MADI (multichannel audio digital interface) - and current Audio over IP (AoIP) systems, such as Audinate’s Dante, are used alongside each other.

“The proliferation of OTT and streaming media has certainly defined a more dynamic and efficient mechanism to obtain and delivery audio channels,” Garcia says. “MADI and Dante, for example, are supported in our 9904-UDX and 9905-MPx processors, which are able to transport bulk audio channels between devices, locations and multi-deliverable packages.” The Cobalt Digital systems, like other audio processing technologies, are expected to handle multiple channels and perform functions such as Shuffle, Embed, De-embed, Route and Deliver.

Another relatively recent requirement that has been added to audio signal processors is loudness monitoring and control. The disparity in the perceived volume between different elements of a TV broadcast - most noticeably the transition from the wide dynamic range of a drama programme to a heavily compressed commercial or trailer - had been one of the biggest causes of viewer complaints since the relatively early days of the medium. Various algorithms and hardware meters were developed over the years to solve the problem but it was not until 2006 that the first agreed standard was introduced.

International standards ITU-R BS1770 (Algorithms to measure audio programme loudness and true-peak audio level) and ITU-R BS1771 (Requirements for loudness and true-peak indicating meters, published in 2007) were followed by more regional standardisation in the form of EBU (European Broadcasting Union) R128 and the US ATSC (Advanced Television Systems Committee) A/85, both 2010. These formed the basis of regulations that have successfully dealt with the problem of loudness and have been used in meters and software-based programs.

“Audio signal processors have a baseline requirement of maintaining consistent loudness as per the global loudness standards,” comments Jayson Tomlin, Senior Director of Marketing at Telos Alliance, “but now include advanced features such as upmixing, watermarking and encoding to immersive audio standards where personalisation is a key value add. Streaming and VoD have similar requirements for live production but can also be processed in the file-based domain for scale and efficiency.”

Two of Telos Alliance’s brands - Linear Acoustic and Minnetonka - offer audio signal processing systems with loudness capability; the group also has an exclusive global arrangement to sell, market and support products produced by loudness and processor specialist Jünger Audio. Managing Partner Friedemann Kootz observes that “the audio processing market has been transforming for many years”, although he qualifies this by saying it has been a slow process.

Kootz calls the move to loudness-based workflows “probably the biggest change in broadcast audio history”.

While watermarking - embedding an inaudible electronic identifier in a signal, typically for content monitoring or as proof of copyright ownership - and speech intelligibility have assumed equal importance in recent years, they have not made as big an impact as loudness. That could change, however, because they both play a part in Next Generation Audio (NGA) formats, including Dolby AC-4/Dolby Atmos and MPEG-H 3D Audio.

Both comprise a foundation of audio channels plus up to 128 audio objects that can either be placed anywhere in the ‘sound picture’ for immersive sound (Dolby Atmos) or provide personalisation features such as language selection and assistive options. At the moment immersive audio is the more widely promoted aspect of object-based audio (OBA)/NGA systems but many developers, including Fraunhofer IIS (the main force behind MPEG-H 3D Audio) see personalisation as the more compelling implementation.

Personalisation enables viewers to not only select specific commentaries and languages but also alter the balance between speech and background music/effects or crowd noise in the case of a sports broadcast. Jünger Audio worked with Fraunhofer IIS during the development of MPEG-H 3D Audio workflows, which Kootz says gave the company “a lot of experience with new audio formats”. He agrees that while immersive audio is the “more impressive” element, personalisation is the driving factor for many broadcasters.

“Accessible media - increasing speech intelligibility, changing languages or adding audio description - has become mandatory in some countries,” he explains. “OBA is different from channel-based audio and we need to adapt existing concepts and even standards.”

At the moment Jünger Audio is focusing on real-time processing, with “the biggest challenge” being to make low latency processor operations available in cloud environments. “We can see a growing interest in shifting large-scale live production on to cloud platforms,” Kootz says. “But managing real-time systems in cloud environments is complicated and that is why many prefer investing in traditional hardware-based processing. We still see demand for both standalone and software systems. Our flexAI infrastructure can run on any x86 CPU-based computer and we can provide the same concept either in virtualised form or as hardware on servers or audio processors. We’re also working on making our processes accessible as time-limited services.”

It is understandable that the changes in how ‘TV’ is made and distributed have called for new audio processing techniques but the diversification in how media is experienced has probably had an even more profound influence. “Media is no long consumed in a stationary, controllable and predictable setting,” says Harald Fuchs, Head of Media Systems and Applications at Fraunhofer IIS. “Accessibility features, various language versions and other customisation options are also a must-have of modern productions. This means today’s media and audio production and transmission require a new audio codec that adapts flexibly to all production situation, platform requirements and playback environments. Current codecs such as xHE-AAC include loudness management and dynamic range control. With MPEG-H Audio, metadata has enabled us to [develop] Universal Delivery, which allows the content to be automatically optimised for the respective platform.”

In terms of the type of processors being used, Cris Garcia at Cobalt Digital describes standalone units as “still key in our industry” even with the growing move into the cloud. “Many users continue to seek standalone solutions for everyday processing,” he says. “But many will also argue that software-based in the cloud is the way to go because users can easily access resources from any location through edge devices. However, with this so-called flexibility you get a series of hurdles, including security and bandwidth. Standalone provides ease of use, full control, no security concerns, native format support and perpetual ownership.”

As for the cloud, Garcia comments that it is “playing a key role” as content owners commercially exploit and re-purpose existing programmes for multi-deliverable versions. “Working in the cloud certainly demands instant access to material whether it be audio or video,” he says. “This means the cloud has directly or indirectly established mechanism requirements for getting different source materials from base band, digital, discrete [and] embedded [sources]. Technologies such as MADI, Dante and now [the] SMPTE ST 2110 [standard] - which are all available in Cobalt solutions - allow a flexible and robust way to deliver a multitude of audio channels to any cloud destination, where a compositor or streaming platform can then define or repurpose those audio channels into new mixes or copies of the original source material.”

It does appear that hardware is not quite out of the game just yet, with, as Telos Alliance’s Jayson Tomlin explains, some areas of broadcasting still relying on good old physical equipment. “Remote trucks are a great example of where a live hardware processor makes sense,” he says. “While there are inherent advantages to using software, both on and off prem, the real advantage of software is to virtualise the functionality of a hardware processor and use it to scale up or down as production requirements change.” Tomlin adds that immersive audio and NextGen TV (ATSC 3.0) have created “new opportunities” to deliver audio processors that enable new experiences such as personalisation that can be dropped into existing workflows through support for ST 2110 and NDI (Network Device Interface).

The basics of audio signal processing remain the same in today’s multi-TV channel, multi-device world but it is an area that has grown considerably in capability and reach. Signs are this is something that will only continue.