Analyzing Speech

Getting Started with the Media Analyze Speech API

❗️

Beta API

This API is being made available as an early preview. If you have feedback on how you'd like to use the API please reach out to share your feedback with our team.

https://dolby.io/contact

The Dolby.io Analyze Speech API is designed to give insight into the speech portions of your media. The output includes information like the number of talkers and details about each talker.

You can build applications that analyze speech in a range of content such as podcasts, online learning, healthcare conferencing, or video sharing. You'll be able to analyze the talker's behavior and get useful insights to create more engaging content.

Getting started

To get started you'll follow these steps.

  1. Get your API token
  2. Prepare your media
  3. Make an Analyze Speech API request
  4. Check the results

1. Get your API token

To use the Analyze Speech API, you need an API token. To learn more about how to get an API token, see API Authentication.

2. Prepare your media

Make your media available for analysis:

a. Use your own cloud storage provider.
b. Use our Dolby Media Input API.

a. Use your own cloud storage provider

You will want to consider this option when you move your applications into production. Our services are able to work with many popular cloud storage services such as AWS S3, Azure Blob Storage, GCP Cloud Storage, or your own services with HTTP(s) and basic authentication.

Please see the Media Input and Output guide for more details on the various options.

b. Use our Dolby Media Input API (optional)

The Media Input API was designed to give you a quick way to upload media while evaluating Dolby.io Media API services. We can securely store your media temporarily, any media you upload will be removed regularly so shouldn't be used for permanent storage.

Call Start Media Input to identify a shortcut url. It must begin with dlb:// but otherwise is your own personal unique identifier. Some valid examples:

  • dlb://example.mp4
  • dlb://input/your-favorite-podcast.mp4
  • dlb://usr/home/me/voice-memo.wav

You can think of this like an object key that is used to identify a file for your account. Once you call POST /media/input you'll be returned a new url in the response. This is a pre-signed URL to a cloud storage location you will use to upload the file. You do that by making a PUT request with your media.

import os
import requests

# Set or replace these values

file_path = os.environ["INPUT_MEDIA_LOCAL_PATH"]

# Declare your dlb:// location

url = "https://api.dolby.com/media/input"
headers = {
    "Authorization": "Bearer {0}".format(api_token),
    "Content-Type": "application/json",
    "Accept": "application/json"
}

body = {
    "url": "dlb://in/example.mp4",
}

response = requests.post(url, json=body, headers=headers)
response.raise_for_status()
data = response.json()
presigned_url = data["url"]

# Upload your media to the pre-signed url response

print("Uploading {0} to {1}".format(file_path, presigned_url))
with open(file_path, "rb") as input_file:
  requests.put(presigned_url, data=input_file)
const fs = require("fs")
const axios = require("axios").default

// Set or replace these values

const file_path = process.env.INPUT_MEDIA_LOCAL_PATH

// Declare your dlb:// location

const config = {
  method: "post",
  url: "https://api.dolby.com/media/input",
  headers: {
    "Authorization": `Bearer ${api_token}`,
    "Content-Type": "application/json",
    "Accept": "application/json"
  },
  data: {
    url: "dlb://in/example.mp4",
  },
}

axios(config)
  .then(function(response) {
    // Upload your media to the pre-signed url response

    const upload_config = {
      method: "put",
      url: response.data.url,
      data: fs.createReadStream(file_path),
      headers: {
        "Content-Type": "application/octet-stream",
        "Content-Length": fs.statSync(file_path).size,
      },
    }
    axios(upload_config)
      .then(function() {
        console.log("File uploaded")
      })
      .catch(function(error) {
        console.log(error)
      })
  })
  .catch(function(error) {
    console.log(error)
  })
curl -X POST https://api.dolby.com/media/input \
    --header "Authorization: Bearer $API_TOKEN" \
    --header 'Content-Type: application/json' \
    --header 'Accept: application/json' \
    --data '{
      "url": "dlb://in/example.mp4"
    }'

# Use the result in a second command replacing `$PRE_SIGNED_URL` with the response from the previous command:

curl -X PUT $PRE_SIGNED_URL -T ./your-local-media.mp4

Once the upload is complete, you'll be able to refer to this media with the dlb://in/example.mp4 shortcut.

3. Make an Analyze Speech API request

The Analyze Speech API requires both the input and output parameters to begin analyzing speech in your media. The Analyze Speech API does not provide any additional parameters to control the analysis.

Regardless of whether you chose to use your own cloud storage or the Dolby.io /media/input service, the Dolby.io API will need to be able to read the media. See the Media Input and Output guide for a more detailed explanation of the various ways you can provide authentication details.

These are all valid input values:

Similarly, for the output, you can use any location which our APIs will be able to write to. Specifying a dlb:// output location will create one on the fly. Here are some examples:

import os
import requests

url = "https://api.dolby.com/media/analyze/speech"
headers = {
    "Authorization": "Bearer {0}".format(api_token),
    "Content-Type": "application/json",
    "Accept": "application/json"
}

body = {
    "input": os.environ["DOLBYIO_INPUT"],
    "output": "dlb://out/example-metadata.json",
}

response = requests.post(url, json=body, headers=headers)
response.raise_for_status()
print(response.json()["job_id"])
const axios = require('axios').default;

const config = {
    method: 'post',
    url: 'https://api.dolby.com/media/analyze/speech',
    headers: {
        "Authorization": `Bearer ${api_token}`,
        "Content-Type": "application/json",
        "Accept": "application/json"
    },
    data: {
        input: process.env.DOLBYIO_INPUT,
        output: 'dlb://out/example-metadata.json'
    }
};

axios(config)
    .then(function (response) {
        console.log(response.data.job_id);
    })
    .catch(function (error) {
        console.log(error);
    });
curl -X POST "https://api.dolby.com/media/analyze/speech" \
    --header "Authorization: Bearer $API_TOKEN" \
    --header 'Content-Type: application/json' \
    --header 'Accept: application/json' \
    --data '{
        "input": $DOLBYIO_INPUT,
        "output": "dlb://out/example-metadata.json"
    }'

The JSON response will include a unique job_id that you'll need to use to check on the status of media analysis.

{"job_id":"a495424-a525-9a9b-34c6-d2a45037aa36"}

4. Check the job status

It will take a few moments for the API to analyze the speech in your file. You'll need to check the status of the job. You can learn more about this in the How It Works section of the Introduction to learn more.

For this GET /media/analyze/speech request you'll need to use the job id returned from the previous step. In these examples, it is specified as an environment variable that you'll need to set or replace in the code samples.

import os
import requests
from pprint import pprint

job_id = os.environ["DOLBYIO_JOB_ID"]

url = "https://api.dolby.com/media/analyze/speech"
headers = {
    "Authorization": "Bearer {0}".format(api_token),
    "Content-Type": "application/json",
    "Accept": "application/json"
}

# TODO: You must replace this value with the job ID returned from the previous step.

params = {
    "job_id": job_id,
}

response = requests.get(url, params=params, headers=headers)
response.raise_for_status()
pprint(response.json())
const axios = require('axios').default;

const config = {
    method: 'get',
    url: 'https://api.dolby.com/media/analyze/speech',
    headers: {
        "Authorization": `Bearer ${api_token}`,
        "Content-Type": "application/json",
        "Accept": "application/json"
    },
  
  //TODO: You must replace this value with the job ID returned from the previous step.
  
    params: {
        job_id: process.env.DOLBYIO_JOB_ID
    }
};

axios(config)
    .then(function (response) {
        console.log(JSON.stringify(response.data, null, 4));
    })
    .catch(function (error) {
        console.log(error);
    });
curl -X GET "https://api.dolby.com/media/analyze/speech?job_id=$DOLBYIO_JOB_ID" \
    --header "Authorization: Bearer $API_TOKEN" \
    
     # TODO: You must replace this value with the job ID returned from the previous step.

While the job is still in progress, you will be able to see the status and progress values returned.

{
  "path": "/media/analyze/speech",
  "status": "Running",
  "progress": 42
}

If you re-run and call again after a period of time you'll see the status changes and the output you originally specified will be ready for download.

{
  "path": "/media/analyze/speech",
  "progress": 100,
  "result": {},
  "status": "Success"
}

5. Download results

Once media analysis is complete, the analysis result file will be PUT in the output location specified when the job was started. If you used the optional Dolby.io temporary storage, you will need to follow a couple steps to download your media. For more information, see the Dolby.io Media Temporary Cloud Storage guide.

import os
import shutil
import requests

output_path = os.environ["OUTPUT_MEDIA_LOCAL_PATH"]

url = "https://api.dolby.com/media/output"
headers = {
    "Authorization": "Bearer {0}".format(api_token),
    "Content-Type": "application/json",
    "Accept": "application/json"
}

args = {
    "url": "dlb://out/example-metadata.json",
}

with requests.get(url, params=args, headers=headers, stream=True) as response:
    response.raise_for_status()
    response.raw.decode_content = True
    print("Downloading from {0} into {1}".format(response.url, output_path))
    with open(output_path, "wb") as output_file:
        shutil.copyfileobj(response.raw, output_file)
const fs = require("fs")
const axios = require("axios").default

const output_path = process.env.OUTPUT_MEDIA_LOCAL_PATH

const config = {
  method: "get",
  url: "https://api.dolby.com/media/output",
  headers: {
    "Authorization": `Bearer ${api_token}`,
    "Content-Type": "application/json",
    "Accept": "application/json"
  },
  responseType: "stream",
  params: {
    url: "dlb://out/example-metadata.json",
  },
}

axios(config)
  .then(function(response) {
    response.data.pipe(fs.createWriteStream(output_path))
    response.data.on("error", function(error) {
      console.log(error)
    })
    response.data.on("end", function() {
      console.log("File downloaded!")
    })
  })
  .catch(function(error) {
    console.log(error)
  })
# You specify the `-L` because the service will redirect you to a cloud storage location. 
# You shouldn't try to retrieve directly from the cloud storage as the location may change, 
# so use **/media/output** with the shortcut instead. 
# The `-O` is to just output the file to your local system with the same filename.

curl  -X GET https://api.dolby.com/media/output?url=dlb://out/example-metadata.json \
      -O -L \
      --header "Authorization: Bearer $API_TOKEN" \

Learn more

Continue to learn about all this API can do by reviewing the Analyze Speech API Guide. Pricing and other details are available from the product page.