Audio Diagnose API
Guide to Using the Media Analyze Audio Diagnose API
Audio Diagnose API
The Dolby.io Media Analyze Audio Diagnose API returns data about your media so you can learn more about it. For each input file, the results include:
Key features:
✓ General media Info
✓ Audio quality score
✓ Noise score
✓ Clipping
✓ Loudness
✓ Content classification
Start building
Why use Audio Diagnose API?
The Audio Diagnose API is a fast and automated way to assess audio quality based on state-of-the-art signal processing and machine learning. Compared to manual subjective listening, it's faster and more consistent.
- Faster: returns results faster than real-time and can scale up to process multiple files in parallel
- Consistent: provides a consistent assessment so you can assess relative quality across media collections and over time with confidence
Audio Diagnose opens up new ways to assess audio quality automatically.
Automated quality control
Consider you are a company that collects content from several creators and provides that content to your customers.
Audio Diagnose can be used for automatic quality control with the media processing flow. You can decide if the content quality is good enough based on the outputs and your own specific needs for quality. For example, the logic might check for sufficient quality score, loudness , and bitrate.
Media Enhance API
If the media is not good enough, you can call the Media Enhance API to improve the audio. After enhancement, the file can be re-diagnosed to confirm whether the quality is sufficient.
If you have audio engineers performing Quality Control (QC), the Audio Diagnose API can assist in screening files and flagging problems first, allowing audio engineers to focus their valuable time and skills on the problem files.
The diagram below shows a potential QC flow to detect quality issues followed by either an automated or manual fix to the problematic file.
Automated corrective advice
Audio Diagnose results can inform the content creator with specific corrective advice. For example, if the content is consistently failing due to sibilance events, the QC automation could email advice to the creator regarding their microphone setup.
Systemic failures alerts
A monitoring system based on Audio Diagnose results could trigger an alert if there were a systemic problem causing a decrease in quality across all content. For example, this might occur if there was a bug in the recording or encoding of the audio.
Understanding a media collection or data set
It's common to have a media collection without much detail to indicate what's included. Audio Diagnose provides media information, content classification, and an overall quality score to help understand the media.
Let's consider a specific example where you are developing an audio machine learning algorithm. These algorithms generally require large amounts of data, and their performance is heavily dependent on the training data. Audio Diagnose can be used to:
- Check that files are correctly labelled. Clean speech should be diagnosed with a high-quality score, a high speech percentage, and zero music percentage.
- Check the diversity of the data. Does the training data have a good balance of audio quality and loudness?
Comparative analysis
Audio Diagnose can be used to compare quality across several files, showing relative performance and looking at trends over time. The plot below shows the performance of the audio quality score over a range of files.
Testing and tuning media workflows
Consider if your company is changing its media workflow. It might be due to new audio processing software, a new codec, or a new media pipeline.
Before a new release, you might rely on automated testing together with subjective listening to ensure non-inferior performance. Automated tests may not be sensitive enough to detect some audible problems and subjective listening tests are so time-consuming that they are limited in scope or can occur too late right before the release. The result is that audible problems may be released and damage reputation.
Audio Diagnose can find a wide range of problems. Since it's been trained to predict subjective scores, it's sensitive to noise, reverb, transcoding losses and more. And because Audio Diagnose is automated it can be run as part of continuous integration testing (and it won't get bored with repeatedly listening to the same reference test content over and over again).
Automated testing
When testing the Media Enhance API, The Dolby.io team uses the Audio Diagnose API to find problems and identify specific media files that need further investigation. We also use the audio quality score to help tune parameters—ensuring the parameter range has a useful and smooth perceptual effect.
Example output
See the Diagnose API reference for more detailed explanations on these values.
General Media Info
The media_info
section gives you details about the container and codec.
"media_info": {
"container": {
"kind": "mp4",
"duration": 10801.645,
"bitrate": 79674,
"size": 107575636
},
"audio": {
"codec": "aac",
"channels": 2,
"sample_rate": 44100,
"duration": 10801.621223993765,
"bitrate": 78286
}
}
Audio quality score
The quality_score
section gives you quick insight into the overall audio quality of a piece of media. See the Audio Quality guide for more explanation on how to interpret these results.
"quality_score": {
"average": 6,
"distribution": [
{
"lower_bound": 0,
"upper_bound": 1,
"duration": 63,
"percentage": 0.6
},
...
],
"worst_segment": {
"start": 9845.5,
"end": 9850.5,
"score": 0.6
}
}
Noise score
The noise_score
section gives you an assessment of how much noise is in the file. See the Noise audio guide for more detail on how to interpret these results.
"noise_score": {
"average": 6.5,
"distribution": [
{
"lower_bound": 0,
"upper_bound": 1,
"duration": 1770,
"percentage": 16.4
},
...
]
}
Clipping
The clipping
section alerts you to any clipping in the file. See the Clipping audio guide for more detail on how to interpret these results.
"clipping": {
"events": 0
}
Loudness
The loudness
section gives you details about the loudness of the media. See the Loudness audio guide for more detail on how to interpret these results.
"loudness": {
"measured": -15.28,
"range": 4.31,
"gating_mode": "speech",
"sample_peak": 0,
"true_peak": 0.04
}
Content classification
The music
, silence
, and speech
blocks help give context to the media file and the type of media it is.
"music": {
"percentage": 34.8
},
"silence": {
"percentage": 1.6,
"at_beginning": 0,
"at_end": 0,
"num_sections": 54,
"silent_channels": []
},
"speech": {
"percentage": 94,
"events": {
"plosive": 3739,
"sibilance": 675
}
}
Updated over 1 year ago