Voice and Speech

A guide to speech and voice in media

Sibilance

Sibilance is a harsh consonant sound like "s", "sh", "x", "ch", "t", and "th" that originates from a talker's pronunciation of words. A sibilance reduction algorithm detects sounds like these by analyzing frequency regions for onsets of energy. When identified, these sounds can be attenuated or reduced to create audio that sounds closer to a studio recorded sound by compensating for non-professional recording equipment. This typically can be found in an upper frequency range (5khz - 8khz) but varies to the specific vocal range of the talker. This is a speech artifact in contrast to plosives which occur in lower frequency ranges.

Maximize Sibilance Attenuation

The Enhance API has an algorithm for sibilance reduction. You can adjust the amount of attenuation applied to suppress sibilance which could be necessary in certain circumstances.

{
  "input": "s3://dolbyio/public/shelby/airplane.original.mp4",
  "output": "dlb://out/airplane.max-sibilance-attenuation.mp4",
  "content": {
    "type": "podcast"
  },
  "audio": {
    "speech": {
      "sibilance": {
        "reduction": {
          "amount": "high"
        }
      }
    }
  }
}

Mouth Clicks

Mouth clicks are small transients caused by the speech articulation using tongue/teeth/lips mixed with saliva. They can occur when the talker is close to the microphone and are often audible on low noise recordings when listening through headphones or earphones.

Enabling Mouth Click Reduction

Mouth click reduction is not enabled by default in the Enhance API. You can enable it by using the studio content.type or by using the audio.speech.click.reduction.enable parameter as in this example.

{
  "audio": {
    "speech": {
      "click": {
        "reduction": {
          "enable": true,
        }
      }
    }
  }
}

Plosives

Plosives are "pops" caused by sounds like "p" and "b" spoken too close to the microphone. Often you will see a pop filter put in front of a microphone in a studio to minimize plosives. A plosive reduction algorithm detects these low-frequency speech artifacts and applies dynamic processing to suppress the pop sound, preserving the speech clarity. Unmanaged plosives can be very distracting to the listener and can also negatively affect other processing.

Maximize Plosive Attenuation

The plosive reduction algorithm is automatically applied in the Enhance API. You can adjust the amount of attenuation applied to suppress plosives which could be necessary in certain circumstances.

{
  "input": "s3://dolbyio/public/shelby/airplane.original.mp4",
  "output": "dlb://out/airplane.max-plosive-attenuation.mp4",
  "audio": {
    "speech": {
      "plosive": {
        "reduction": {
          "amount": "max"
        }
      }
    }
  }
}