Analyze Speech API

Guide to Using the Media Analyze Speech API

Media Speech Analytics API

The Media Speech Analytics API takes your media and delivers insights into the audio quality for individual talkers. This is useful for content that is primarily speech-based.

Key features:

✓ General media info
✓ Speaker diarization
✓ Number of talkers
✓ Talk / listen ratio
✓ Quality scoring (by talker)
✓ Quality events (by talker)
✓ Loudness (by talker)

❗️

Beta API

This API is being made available as an early preview. If you have feedback on how you'd like to use the API please reach out to share your feedback with our team.

https://dolby.io/contact

Start building

Why use Media Speech Analytics API?

Do you need to know:

  • the number of talkers in your media and when they are talking?
  • loudness of each talker so that you can loudness correct if needed?
  • quality score of each talker to identify if a talker's setup has a systemic problem?
  • useful talker metrics like talk-listen-ratio?

Example output

See the Analyze Speech API reference for more detailed descriptions of the values.

Media info

The media_info section gives you details about the container and codec. See the Media File Formats guide for more detailed explanations on what this data means.

        "media_info": {
            "container": {
                "kind": "mp4",
                "duration": 10801.645,
                "bitrate": 79674,
                "size": 107575636
            },
            "audio": {
                "codec": "aac",
                "channels": 2,
               "channel_order": "L R",
                "sample_rate": 44100,
                "duration": 10801.621223993765,
                "bitrate": 78286
            }
        }

Speaker diarization

The speech section identifies each individual talker, and then each section in which that talker was recognized as speaking. See the Voice and Speech guide for more detailed explanations of what this data means.

"speech": {
    "num_talkers": 47,
    "percentage": 94.0,
    "details": [
         {
            "talker_id": 1,
            "talk_listen_ratio": 0.01,
            "loudness": {
                 "measured": -16.86,
                 "range": 9.07
            },
            "quality_score": 4.94,
            "longest_monologue": 29.41,
            "num_sections": 12,
            "sections": [
               {
                    "section_id": "tk_1_1",
                    "start": 148.39,
                    "duration": 1.8,
                    "confidence": 0.22,
                    "loudness": -20.06,
                    "quality_score": 4.8
                },
                ...
        },
        ....
    ]
}

Did this page help you?