Analyze Speech API

Guide to Using the Media Analyze Speech API

Media Speech Analytics API

The Media Speech Analytics API takes your media and delivers insights into the audio quality for individual talkers. This is useful for content that is primarily speech-based.

Key features:

✓ General media info
✓ Speaker diarization
✓ Number of talkers
✓ Talk / listen ratio
✓ Quality scoring (by talker)
✓ Quality events (by talker)
✓ Loudness (by talker)

Start building

Why use Media Speech Analytics API?

Do you need to know:

  • the number of talkers in your media and when they are talking?
  • loudness of each talker so that you can loudness correct if needed?
  • quality score of each talker to identify if a talker's setup has a systemic problem?
  • useful talker metrics like talk-listen-ratio?

Example output

See the Analyze Speech API reference for more detailed descriptions of the values.

Media info

The media_info section gives you details about the container and codec. See the Media File Formats guide for more detailed explanations on what this data means.

        "media_info": {
            "container": {
                "kind": "mp4",
                "duration": 10801.645,
                "bitrate": 79674,
                "size": 107575636
            "audio": {
                "codec": "aac",
                "channels": 2,
               "channel_order": "L R",
                "sample_rate": 44100,
                "duration": 10801.621223993765,
                "bitrate": 78286

Speaker diarization

The speech section identifies each individual talker, and then each section in which that talker was recognized as speaking. See the Voice and Speech guide for more detailed explanations of what this data means.

"speech": {
    "num_talkers": 47,
    "percentage": 94.0,
    "details": [
            "talker_id": 1,
            "talk_listen_ratio": 0.01,
            "loudness": {
                 "measured": -16.86,
                 "range": 9.07
            "quality_score": 4.94,
            "longest_monologue": 29.41,
            "num_sections": 12,
            "sections": [
                    "section_id": "tk_1_1",
                    "start": 148.39,
                    "duration": 1.8,
                    "confidence": 0.22,
                    "loudness": -20.06,
                    "quality_score": 4.8

Explore the fascinating world of archival recording collections and learn how our API can help enhance these valuable resources by checking out our latest blog post. Visit the link to discover more!