Enhance API

Guide to Using the Media Enhance API

Media Enhance API

The Media Enhance API takes your media and enhances it by improving common audio quality issues. If you are building an application where users expect high quality sonic clarity, this API is a valuable integration for web and mobile applications.

Key features:

✓ Content tuning
✓ Noise reduction
✓ Speech leveling
✓ Speech isolation
✓ Loudness correction
✓ Sibilance reduction
✓ Plosive reduction
✓ Dynamic equalization
✓ Hum reduction
✓ Mouth click reduction


Not a Developer?

To get started without writing any code, you can use the Media Demo to upload a file and view the results.

Start building

Why use Media Enhance API?

Do you need to:

  • make your audio sound better without being an audio expert?
  • make listening easier by removing background noise and boosting speech?
  • improve speech quality by balancing tone and removing harshness?
  • conform your media to meet streaming platform recommendations and standards?

The Media Enhance API can do all of this for you without the need to manually edit the audio yourself. The audio or video file can be transformed by processing it through our API, with options for what your specific needs are.

Speech enhancement for content QC

For user generated content, podcast creation, audio books, recorded interviews, lectures, and various production workflows it can be helpful to be able to automate with tools and apps that improve the audio quality. With different capture environments and equipment you may not always have a professional sound without additional manual editing by a highly skilled professional.

The need to correct these sorts of audio issues is common but can be a complex and time consuming process. A specialist takes captured audio and must pre-process, edit, and master to correct loudness, noise reduction, dynamics processing, and equalization. This often requires sophisticated tools, specialized knowledge, and a complex collection of plugins to create content that your audience appreciates.

As an alternative to a series of manual steps, the Enhance API runs a speech enhancement chain to intelligently provide these types of improvements in a way that can be run from mobile devices, web applications, or any back-end service.

With Dolby.io APIs, some of these steps and processing can be run in parallel which saves a lot of valuable time to rapidly deliver high quality content. This has a number of benefits:

  • increased output for massive amounts of content
  • consistent sound quality across a collection
  • more pleasant result for the audience which increases their satisfaction

Removing speech impurities with deep learning

We've created a number of audio guides that help explain various types of speech impurities. One example is hum. A hum is a constant noise with undesirable tonal components. The source of this can come from a range of things such as a 50Hz harmonic noise from electrical lines, or the acoustic noise coming from air blowing through HVAC pipes.

The De-Hum step estimates peak frequencies in the noise spectrum and then subtracts them frame-by-frame, effectively removing the hum. This is one of several noise removal steps applied.

The heavy-lifting is done by neural networks that have been trained to distinguish speech from noise, and then suppress the noise. We combine these advanced detection techniques with some simple well-known industry practices as well. For example, a High-Pass filter is a good example. The filter acts to remove low-frequency components in the audio, and then anything below 80Hz where vocal frequencies are not present. This is typical in most manual workflows to reduce noise, rumble, etc. By applying it in the chain, the rest of the deep learning algorithms have an easier job to do and the result is great sounding audio.



How-to Articles

Use our series of How-To guides to explore some frequently encountered scenarios and functionality.

Enhancing Media
A simple "getting started" example that walks through your first /media/enhance API call.

How to Improve Audio Quality by Content Type
Use presets based on characteristics of the original audio to get output optimized for that scenario.

How to Improve Dynamic Range
The dynamics settings can help with setting a range control or disabling dynamic eq which may be necessary for certain types of content.

How to Correct Loudness
Explains customization of loudness settings that may be necessary to meet the requirements of various distribution partners.

Video Tutorial

Review the Enhancing Media getting started guide for code samples or watch this video tutorial.


Enhance while transcoding

The Transcode API is a powerful service that allows you to trim, stitch, or output multiple formats. You can apply speech enhancement while transcoding as a single operation.

This can be helpful for use cases such as:

  • Adding an introduction to your conference and convert it to HLS for replays.
  • Taking user mobile device recordings and normalize the format.

Take a look at the How to Enhance your Audio while Transcoding guide for instructions on how to use both services together.

Enhancing music

The Enhance API is intended for speech enhancement. To improve the sound of music, try the Music Mastering API.