The audience that will listen to your media has control over the volume, i.e. they can turn it up or down and set it at their desired listening level. What platforms should want to avoid is requiring consumers to adjust their volume every time there is a transition from one piece of media to the next.
- from one song to another
- from one sitcom episode to a commercial break
- from a podcast into a sponsor pitch
- from social media updates with multimedia ads in the timeline
Loudness is the subjective perception of the level of sound. The loudness level should be kept consistent for the best listener experience. The reliable control of audio loudness has long been an issue for content creators, producers, and broadcasters. There have been various approaches to the normalization of loudness used in broadcasting which resulted in inconsistencies.
The fundamental problem comes down to a standardized method of measuring loudness of an audio program. There have historically been some objective measures, but beginning in the early 2000s the International Telecommunication Union (ITU) started work to develop a loudness metric for use in broadcasting. The published recommendation ITU-R BS.1770 "Algorithms to measure audio programme loudness and true-peak level" describes a measurement algorithm that has become the "standard" for loudness of media in broadcast and online streaming services.
Media APIs can help with Loudness
The algorithm specified in Recommendation ITU-R BS.1770 provides an objective measure that estimates the subjective loudness of an audio signal. This algorithm can be used by content providers, distributors, and hardware devices to reproduce audio programs at a similarly perceived loudness (with or without video). The objective loudness measure can provide a single loudness value for the program. This value can be used to calculate a global offset that can be applied to a program to align media with a similar loudness target.
The ITU-R BS.1770 loudness algorithm integrates the frequency-weighted power of all audio channels over a finite time period, typically from the beginning to end of the audio media or program. The weighted power of all channels is then windowed and summed, using a blocksize of 400 ms with a 70% overlap. An absolute level gate, set at -70, removes all the silent of low level blocks, and then a -10 (dB) relative level gate is applied, keeping only the "loudest" blocks. The result is a single loudness value in units of LKFS, loudness K-weighted full-scale. It should be noted, that the unit of LKFS is equivalent to a decibel scale, such that a 1 dB increase will cause the loudness reading to increase by 1 LKFS. This is often referred to as a 1 LU increase in loudness. The EBU uses LUFS as an equivalent to LKFS, but they are identical measures.
When the entire program, i.e. all channels from beginning to end, are measured using the ITU-R BS.1770 algorithm, the result represents the loudness of the entire program and is often referred to as full program mix or program loudness in units of LKFS. The loudness of the dialog or speech is also used and measured, where only the parts of the media or program that contain speech/dialog is measured using ITU-R BS.1770. This is sometimes referred to as anchor-based loudness, with speech/dialog being the most common anchor that is used, and is referred to as the Dialog Loudness.
Aligning media/content using Dialog Loudness ensures that the level of the dialog level in media is consistent across programs and/or channels. This is commonly used in broadcast, especially with wide-dynamic range content, such as movies and premium episodic content. Having access to the isolated dialog of a program is not always possible, therefore a method to measure the loudness of the speech/dialog in already mixed media is required.
Gating is a technique used to control the volume of audio by controlling when and how much of a signal is allowed to pass through a channel beyond a measured threshold. This improves the reliability of loudness estimation by controlling when and to what degree audio is filtered.
Dolby Dialog Intelligence is a technology that detects speech/dialog, extracts it and measures the extracted speech using the ITU-R BS.1770 algorithm. It is an industry-proven method for dialog/speech loudness measurements. The process of using an algorithm to extract and measuring the loudness of the speech, is commonly referred to as a speech-gated loudness measurement. This is compared the BS.1770 measurement which employs a relative-level gated or level-gated loudness measurement which only measures the "loudest parts" of the program.
The program loudness and dialog loudness measurements are based on the entire program/media and meant to provide an estimate of the loudness for the program/media as a whole. Short-term loudness is another loudness metric based on BS.1770, but using a 3-second window instead of the entire program and a relative-level gate is not used. It provides more of a localized measure of loudness in the program for the time the short-term loudness is measured, and defined in recommendations.
Loudness Range (LRA) is often use dto gauge the dynamics off a program or media. It is a statistical metric based on the short-term loudness values of the program, and meant to give an indication of the range of loudness a program has measured in LU.
This is useful in cases where media has multiple components, say background music and dialog. It's reasonable that a human listener would expect loudness of the dialog to be the anchor of how loud media is judged to be.
Audio signals can be represented in the analog domain by a continuous waveform, or in the digital domain by discretely sampled sequence of values. Knowing the maximum value/level of the audio signal is important to avoid clipping of the signal in downstream devices, thus potentially compromising the user experience. The typical way of measuring the maximum level a digital signal is by indicating the sample peak which is the maximum absolute value of the sampled audio. This value may not be the maximum peak of the signal in the analog domain, where the true peak may exceed the sample peak. Therefore, estimating the true peak of the audio signal in the digital domain is useful and details of how to measure it is contained in the second part of Recommendation ITU-R BS.1770. The units for sample peak are dBFS whereas for true peak they are dBTP.
These are some standard approaches which are reviewed as they impact measurement and conformance.
As stated above, this is the basis for most of the broadcasting and streaming loudness recommendations around the world. It has been revised a few times since inception:
BS.1770-0/1: The original recommendation which did not describe any form of gating mechanism to remove quiet or silent passages from the loudness measurement.
BS.1770-2: Addition of an absolute-level gate and relative-level gate to the loudness algorithm.
BS.1770-3: Optional emphasis and DC blocking of the true peak algorithm were removed. They were rarely used in industry implementations.
BS.1770-4: Annex 3 was added that includes an extension to the original loudness algorithm for the measurement of immersive audio channel configurations, such as 5.1.2. Note that the extended algorithm in Annex 3 is backward compatible with Annex 1 for stereo and 5.1 channel configuration.
It should be noted that for loudness measurements, a BS.1770 2, 3, and 4 compliant meter should and will produce identical results for mono/stereo and 5.1 channel content.
Advanced Television Systems Committee (ATSC) A/85.
See the Specification for more details.
The European Broadcasting Union (EBU) R.128. It should be noted that in R128, the units of loudness that are used is LUFS.
See the Specification for more details.
A metering function is used to measure loudness and can be used to check for compliance as required or recommended by various global and regional loudness recommendations.
- Dialog/Speech loudness based on Dolby Dialog Intelligence using ITU-R BS.1770
- Full-program mix/integrated loudness using ITU-R BS.1770-4
- True peak as defined in ITU-R BS.1770-4
- Short-term and momentary loudness defined in ITU-R BS.1771
- Loudness Range as defined in EBU R.128
- sample peak
The following profiles are defined for identifying the constraints used in recommendations for different platforms and standards.
|Loudness profile||Maximum loudness (LKFS)||Minimum loudness (LU)||Maximum True Peak (dBTP)|
Support for loudness recommendations and standards ensure consumers have a pleasant end-user experience. You can use Dolby.io Media APIs to learn about and correct the loudness of your media.
With these APIs you can answer questions like:
- Do I conform to a specific broadcasting standard recommendation?
- Will my media be adjusted by a platform once uploaded?
- What percentage of media is attributable to dialog?
- What is the peak loudness level (in decibels) for user-generated content?
- How do I fix the loudness so that content will be accepted?
Updated 5 months ago