How to Improve Audio Quality by Content Type

The Dolby.io Enhance API and Analyze API work well for a wide variety of content. By giving the algorithms a hint as to the type of content you are trying to process or the characteristics of how it was recorded, the result can be even more improved over the default settings.

Content Types

The following table gives an overview of the different content types and a description of the scenario for which it is ideally suited. This includes things like the characteristics of room or recording equipment being used. These can be set as the content.type parameter on both the Media Enhance and Media Analyze APIs.

Content Type

Description

conference

The conference type is useful for media that is primarily speech content captured in a room where the microphone is far away from the talker.

To improve this type of content, the dynamic range is boosted since the position of multiple talkers in the room can be very different locations. The noise reduction and speech isolation is also tuned since often this type of content has many people in the audience causing chair squeaks, overhead heating & cooling, people shuffling around, etc.

interview

The interview type is useful for media that is primarily speech content with the microphone placed close to the talker.

To improve this type of content, the dynamic range is boosted since the position of the interviewer and interviewee(s) may be very different locations but should have the same loudness level. The noise reduction and speech isolation is also tuned since often this type of content is captured in a small room where chair movement, HVAC systems, etc. can be suppressed.

lecture

The lecture type is useful for media that is primarily speech content captured in a very large room where the microphone is far away from the talker.

To improve this type of content, the dynamic range is boosted since the position of the primary talker in the room can be very different locations from anybody asking questions. The noise reduction and speech isolation is also tuned since often this type of content has many people in the audience causing chair squeaks, overhead heating & cooling, people shuffling around, etc.

meeting

The meeting type is useful for media that is primarily speech content with the microphone placed close to the talkers.

To improve this type of content, the dynamic range is boosted to max since the position of the meeting participants may be very different locations around a room but should have the same loudness level regardless of where they are seated. The noise reduction and speech isolation is also tuned since often this type of content is captured in a small room where chair movement, HVAC systems, etc. can be suppressed.

mobile_phone

The mobile_phone type is useful for media that is captured from variable microphone locations because the phone may move closer and farther away from the talker during the duration of the content.

To improve this type of content, the noise reduction and speech isolation is set to max because the environment is varied to include both indoor and outdoor locations where the media may have been recorded. Since a mobile phone is often held a distance from the talkers mouth, plosive reduction is set to low.

music

The music type is valuable if your media is primarily musical in nature. The Media Enhance API is intended to optimize the sound of speech and dialog so may cause issues in some music.

If you want mixed content of both music and speech, using this setting will disable many of the speech isolation and noise suppression features that would have unintended consequences on the music. This will help improve leveling and loudness issues which is an important consideration for many streaming services.

podcast

The podcast type is useful for balancing music and speech in your media where frequently the music is used during transitions such as the beginning or end of a session.

The dynamic range control is maximized because many podcasts will record talkers separately and mix it together sourced from different microphones and room environments that must be leveled. The speech isolation is tuned lower because many talkers are recording themselves in a controlled environment.

studio

The studio content type is typical of close mic'd speech content with limited background noise as you might typically find in a professional studio.

voice_over

The voice_over type is very similar to the podcast type in that music detection is enabled since the main talker may be speaking over top of commercials, advertisements, jingles, or just musical accompaniment.

Music and Speech

If your content contains both speech and music you might need to enable music detection which will help the algorithms try to preserve music in the media instead of considering it as background noise that is distracting from speech.

This would be set in an API request like this example:

  "music": {
        "detection": {
            "enable": true
        }
  }

An alternative to making individual tuning preferences such as music detection, speech isolation, etc. the single content type setting can adjust all those parameters together. It acts like a hint as to the type of media and the creative intentions for it.

For example, the podcast or music type would be ideally suited or this mixed media.

  "content": {
      "type": "podcast"
  }

This is in addition to the regular input and output values required for each request.

Sampling Content Types

To evaluate different content type settings with your media, you may want to process just a shorter segment of the overall media file. See How to Process a Region of an Input File for an example of how to accomplish that.

Learn More

See the Enhance API Reference for additional details.


Did this page help you?