Audio Quality

Guide to better audio quality

The quality of audio has a direct correlation with how media is experienced. It has the power to make an audience feel immersed in the content or distract the audience from understanding the message.

The audio guides in this section should help with building an understanding of some of the common quality issues present in media. You'll also find how-to articles to demonstrate where a Media API may be helpful to identify or fix them.

Quality Score

The quality_score is a discrete measure of the quality of speech within an audio stream. This automatic speech quality assessment is powered by semi-supervised machine learning algorithms developed at Dolby. It helps quantify or measure multiple aspects of speech quality simultaneously taking into consideration types of signal and sources of degradation.

Contributing to low quality scores could be things such as codec artifacts, additive noise, clipping, reverberation, sibilance, packet loss, plosives, phase distortion, etc. Conversely, a high quality score should correlate with subjective judgement by a human listener. The result is an objective measure that can make assessments more consistent and fast.

This can be helpful for problems such as:

  • audio analysis of user-generated content
  • monitoring the quality of service over time
  • automated decision-making as part of a media workflow

Depending on the usage requirements, you can retrieve a quality score:

  1. for a piece of media overall using the Diagnose API
  2. for an individual talker within a recording using the Speech Analytics API

How to Interpret the Quality Score

When looking at the quality score, the relative value may be a better indicator than the absolute value. The value ranges from 0 to 10 with a high score indicating a higher quality speech assessment.

  • did the score improve after being processed with the Enhance API
    or manual editing
  • is the relative score of a piece of media within a collection above or below the average of other media by comparison

The average quality score takes the entire file into account but excludes silence segments in making a determination. In addition, you can retrieve dynamic details for the distribution as the quality score may change over the duration for a given piece of media. Each interval includes a lower_bound, upper_bound, duration, and percentage calculation relative to the file overall.

The Media Diagnose API can provide the worst_segment which may be a good indicator for where a manual subjective review to assess what may be contributing to poor quality is warranted. The start and end time along with the score is provided for that segment.

Bandwidth

The bandwidth metric indicates a file's audio frequency range. This can identify files with reduced high-frequency content, such as incorrectly transcoded audio or perhaps the wrong source file used as part of a downmix process. As part of a quality control process, this can help identify whether a piece of media is suitable to be used in a particular use case.

Silence

Silence is the absence of any content such as music or speech. Some silence is natural as pauses in speech but too much silence may indicate a problem where the intended signal was lost. For many applications, it may be desirable to remove silence completely without a better understanding of its cause. The Dolby.io Analyze API can help identify how much of the total file is silent and the number of occurrences or sections containing silence.

The recognition of silence can be controlled using the threshold and duration parameters of the Analyze API. The minimum value for duration is 0.5 seconds. This implies that the API would need to identify silence of at least 0.5 sec before making that classification.

Silence Example

When looking at your own media, you may see results like this:

{
  "processed_region": {
    "audio": {
      "silence": {
        "percentage": 5.3,
        "num_sections": 2,
        "sections": [
          {
            "section_id": "si_1",
            "start": 3.08,
            "duration": 2.08,
            "channels": [
              "ch_0"
            ]
          },
          {
            "section_id": "si_2",
            "start": 8.3,
            "duration": 3.2,
            "channels": [
              "ch_0",
              "ch_1"
            ]
          }
        ]
      }
    }
  }
}