Voice and Speech
A guide to speech and voice in media
Sibilance
Sibilance is a harsh consonant sound like "s", "sh", "x", "ch", "t", and "th" that originates from a talker's pronunciation of words. A sibilance reduction algorithm detects sounds like these by analyzing frequency regions for onsets of energy. When identified, these sounds can be attenuated or reduced to create audio that sounds closer to a studio recorded sound by compensating for non-professional recording equipment. This typically can be found in an upper frequency range (5khz - 8khz) but varies to the specific vocal range of the talker. This is a speech artifact in contrast to plosives which occur in lower frequency ranges.
Maximize Sibilance Attenuation
The Enhance API has an algorithm for sibilance reduction. You can adjust the amount of attenuation applied to suppress sibilance which could be necessary in certain circumstances.
{
"input": "s3://dolbyio/public/shelby/airplane.original.mp4",
"output": "dlb://out/airplane.max-sibilance-attenuation.mp4",
"content": {
"type": "podcast"
},
"audio": {
"speech": {
"sibilance": {
"reduction": {
"amount": "high"
}
}
}
}
}
Mouth Clicks
Mouth clicks are small transients caused by the speech articulation using tongue/teeth/lips mixed with saliva. They can occur when the talker is close to the microphone and are often audible on low noise recordings when listening through headphones or earphones.
Enabling Mouth Click Reduction
Mouth click reduction is not enabled by default in the Enhance API. You can enable it by using the studio content.type
or by using the audio.speech.click.reduction.enable
parameter as in this example.
{
"audio": {
"speech": {
"click": {
"reduction": {
"enable": true,
}
}
}
}
}
Plosives
Plosives are "pops" caused by sounds like "p" and "b" spoken too close to the microphone. Often you will see a pop filter put in front of a microphone in a studio to minimize plosives. A plosive reduction algorithm detects these low-frequency speech artifacts and applies dynamic processing to suppress the pop sound, preserving the speech clarity. Unmanaged plosives can be very distracting to the listener and can also negatively affect other processing.
Maximize Plosive Attenuation
The plosive reduction algorithm is automatically applied in the Enhance API. You can adjust the amount of attenuation applied to suppress plosives which could be necessary in certain circumstances.
{
"input": "s3://dolbyio/public/shelby/airplane.original.mp4",
"output": "dlb://out/airplane.max-plosive-attenuation.mp4",
"audio": {
"speech": {
"plosive": {
"reduction": {
"amount": "max"
}
}
}
}
}
Updated over 2 years ago