Audio Diagnose API

Guide to Using the Media Analyze Audio Diagnose API

Audio Diagnose API

The Media Analyze Audio Diagnose API returns data about your media so you can learn more about it. For each input file, the results include:

Key features:

✓ General media Info
✓ Audio quality score
✓ Noise score
✓ Clipping
✓ Loudness
✓ Content classification

Start building

Why use Audio Diagnose API?

The Audio Diagnose API is a fast and automated way to assess audio quality based on state-of-the-art signal processing and machine learning. Compared to manual subjective listening, it's faster and more consistent.

  • Faster: returns results faster than real-time and can scale up to process multiple files in parallel
  • Consistent: provides a consistent assessment so you can assess relative quality across media collections and over time with confidence

Audio Diagnose opens up new ways to assess audio quality automatically.

Automated quality control

Consider you are a company that collects content from several creators and provides that content to your customers.

Audio Diagnose can be used for automatic quality control with the media processing flow. You can decide if the content quality is good enough based on the outputs and your own specific needs for quality. For example, the logic might check for sufficient quality score, loudness , and bitrate.


Media Enhance API

If the media is not good enough, you can call the Media Enhance API to improve the audio. After enhancement, the file can be re-diagnosed to confirm whether the quality is sufficient.

If you have audio engineers performing Quality Control (QC), the Audio Diagnose API can assist in screening files and flagging problems first, allowing audio engineers to focus their valuable time and skills on the problem files.

The diagram below shows a potential QC flow to detect quality issues followed by either an automated or manual fix to the problematic file.


Example workflow for Quality Control

Automated corrective advice

Audio Diagnose results can inform the content creator with specific corrective advice. For example, if the content is consistently failing due to sibilance events, the QC automation could email advice to the creator regarding their microphone setup.

Systemic failures alerts

A monitoring system based on Audio Diagnose results could trigger an alert if there were a systemic problem causing a decrease in quality across all content. For example, this might occur if there was a bug in the recording or encoding of the audio.

Understanding a media collection or data set

It's common to have a media collection without much detail to indicate what's included. Audio Diagnose provides media information, content classification, and an overall quality score to help understand the media.

Let's consider a specific example where you are developing an audio machine learning algorithm. These algorithms generally require large amounts of data, and their performance is heavily dependent on the training data. Audio Diagnose can be used to:

  • Check that files are correctly labelled. Clean speech should be diagnosed with a high-quality score, a high speech percentage, and zero music percentage.
  • Check the diversity of the data. Does the training data have a good balance of audio quality and loudness?

Comparative analysis

Audio Diagnose can be used to compare quality across several files, showing relative performance and looking at trends over time. The plot below shows the performance of the audio quality score over a range of files.


Comparison of audio quality for a collection

Testing and tuning media workflows

Consider if your company is changing its media workflow. It might be due to new audio processing software, a new codec, or a new media pipeline.

Before a new release, you might rely on automated testing together with subjective listening to ensure non-inferior performance. Automated tests may not be sensitive enough to detect some audible problems and subjective listening tests are so time-consuming that they are limited in scope or can occur too late right before the release. The result is that audible problems may be released and damage reputation.

Audio Diagnose can find a wide range of problems. Since it's been trained to predict subjective scores, it's sensitive to noise, reverb, transcoding losses and more. And because Audio Diagnose is automated it can be run as part of continuous integration testing (and it won't get bored with repeatedly listening to the same reference test content over and over again).


Automated testing

When testing the Media Enhance API, The team uses the Audio Diagnose API to find problems and identify specific media files that need further investigation. We also use the audio quality score to help tune parameters—ensuring the parameter range has a useful and smooth perceptual effect.

Example output

See the Diagnose API reference for more detailed explanations on these values.

General Media Info

The media_info section gives you details about the container and codec.

        "media_info": {
            "container": {
                "kind": "mp4",
                "duration": 10801.645,
                "bitrate": 79674,
                "size": 107575636
            "audio": {
                "codec": "aac",
                "channels": 2,
                "sample_rate": 44100,
                "duration": 10801.621223993765,
                "bitrate": 78286

Audio quality score

The quality_score section gives you quick insight into the overall audio quality of a piece of media. See the Audio Quality guide for more explanation on how to interpret these results.

        "quality_score": {
                "average": 6,
                "distribution": [
                        "lower_bound": 0,
                        "upper_bound": 1,
                        "duration": 63,
                        "percentage": 0.6
                "worst_segment": {
                    "start": 9845.5,
                    "end": 9850.5,
                    "score": 0.6

Noise score

The noise_score section gives you an assessment of how much noise is in the file. See the Noise audio guide for more detail on how to interpret these results.

            "noise_score": {
                "average": 6.5,
                "distribution": [
                        "lower_bound": 0,
                        "upper_bound": 1,
                        "duration": 1770,
                        "percentage": 16.4


The clipping section alerts you to any clipping in the file. See the Clipping audio guide for more detail on how to interpret these results.

            "clipping": {
                "events": 0


The loudness section gives you details about the loudness of the media. See the Loudness audio guide for more detail on how to interpret these results.

            "loudness": {
                "measured": -15.28,
                "range": 4.31,
                "gating_mode": "speech",
                "sample_peak": 0,
                "true_peak": 0.04

Content classification

The music, silence, and speech blocks help give context to the media file and the type of media it is.

            "music": {
                "percentage": 34.8
            "silence": {
                "percentage": 1.6,
                "at_beginning": 0,
                "at_end": 0,
                "num_sections": 54,
                "silent_channels": []
            "speech": {
                "percentage": 94,
                "events": {
                    "plosive": 3739,
                    "sibilance": 675