NEWDolby Media Processing APIs are now the Dolby.io Media APIs Learn More >
X

Analyzing Speech

Getting Started with the Media Analyze Speech API

The Dolby.io Media Analyze Speech API is designed to give insight into the speech portions of your media. The output includes information like the number of talkers and details about each talker.

You can build applications that analyze speech in a range of content such as podcasts, online learning, healthcare conferencing, or video sharing. You'll be able to analyze the talker's behavior and get useful insights to create more engaging content.

Getting Started

To get started you'll follow these steps.

  1. Get your API key
  2. Prepare your media
  3. Make an Analyze Speech API request
  4. Check the results

1. Get your API key

In order to use the Analyze Speech API you will need an API key. When you sign up for an account you are able to retrieve your API key from the Media API section of the dashboard. It will be a globally unique identifier (guid) that is passed to all API requests as a header called x-api-key.

2. Prepare your media

You have a choice for how to make your media available to the Analyze Speech API:

a. Use your own cloud storage provider.
b. Use our Dolby Media Input API.

a. Use your own cloud storage provider

You will want to consider this option when you move your applications into production. Our services are able to work with many popular cloud storage services such as AWS S3, Google Cloud Storage, or your own services with basic or token based authentication.

Please see the Media Input and Output guide for more details on the various options.

b. Use our Dolby Media Input API (optional)

The Media Input API was designed to give you a quick way to upload media while evaluating Dolby.io Media API services. We can securely store your media temporarily, any media you upload will be removed regularly so shouldn't be used for permanent storage.

Call Start Media Input to identify a shortcut url. It must begin with dlb:// but otherwise is your own personal unique identifier. Some valid examples:

  • dlb://example.mp4
  • dlb://input/your-favorite-podcast.mp4
  • dlb://usr/home/me/voice-memo.wav

You can think of this like an object key that is used to identify a file for your account. Once you call POST /media/input you'll be returned a new url in the response. This is a pre-signed URL to a cloud storage location you will use to upload the file. You do that by making a PUT request with your media.

The following examples use an environment variable to identify the api key (DOLBYIO_API_KEY) and the file path to your file (INPUT_MEDIA_LOCAL_PATH). You need to either set those variables for your environment or update the code samples with new values.

import os
import requests

# Set or replace these values

file_path = os.environ["INPUT_MEDIA_LOCAL_PATH"]
api_key = os.environ["DOLBYIO_API_KEY"]

# Declare your dlb:// location

url = "https://api.dolby.com/media/input"
headers = {
    "x-api-key": api_key,
    "Content-Type": "application/json",
    "Accept": "application/json",
}

body = {
    "url": "dlb://in/example.mp4",
}

response = requests.post(url, json=body, headers=headers)
response.raise_for_status()
data = response.json()
presigned_url = data["url"]

# Upload your media to the pre-signed url response

print("Uploading {0} to {1}".format(file_path, presigned_url))
with open(file_path, "rb") as input_file:
  requests.put(presigned_url, data=input_file)
const fs = require("fs")
const axios = require("axios").default

// Set or replace these values

const file_path = process.env.INPUT_MEDIA_LOCAL_PATH
const api_key = process.env.DOLBYIO_API_KEY

// Declare your dlb:// location

const config = {
  method: "post",
  url: "https://api.dolby.com/media/input",
  headers: {
    "x-api-key": api_key,
    "Content-Type": "application/json",
    Accept: "application/json",
  },
  data: {
    url: "dlb://in/example.mp4",
  },
}

axios(config)
  .then(function(response) {
    // Upload your media to the pre-signed url response

    const upload_config = {
      method: "put",
      url: response.data.url,
      data: fs.createReadStream(file_path),
      headers: {
        "Content-Type": "application/octet-stream",
        "Content-Length": fs.statSync(file_path).size,
      },
    }
    axios(upload_config)
      .then(function() {
        console.log("File uploaded")
      })
      .catch(function(error) {
        console.log(error)
      })
  })
  .catch(function(error) {
    console.log(error)
  })
curl -X POST https://api.dolby.com/media/input \
  --header "x-api-key: $DOLBYIO_API_KEY" \
  --data '{
      "url": "dlb://in/example.mp4"
  }'

# Use the result in a second command replacing `$PRE_SIGNED_URL` with the response from the previous command:

curl -X PUT $PRE_SIGNED_URL -T ./your-local-media.mp4

Once the upload is complete, you'll be able to refer to this media with the dlb://in/example.mp4 shortcut.

3. Make an Analyze Speech API Request

The Analyze Speech API requires both the input and output parameters to begin analyzing speech in your media. The Analyze Speech API does not provide any additional parameters to control the analysis.

Regardless of whether you chose to use your own cloud storage or the Dolby.io /media/input service, the Dolby.io API will need to be able to read the media. See the Media Input and Output guide for a more detailed explanation of the various ways you can provide authentication details.

These are all valid input values:

Similarly, for the output, you can use any location which our APIs will be able to write to. Specifying a dlb:// output location will create one on the fly. Here are some examples:

import os
import requests

url = "https://api.dolby.com/media/analyze/speech"
headers = {
    "x-api-key": os.environ["DOLBYIO_API_KEY"],
    "Content-Type": "application/json",
    "Accept": "application/json",
}

body = {
    "input": os.environ["DOLBYIO_INPUT"],
    "output": "dlb://out/example-metadata.json",
}

response = requests.post(url, json=body, headers=headers)
response.raise_for_status()
print(response.json()["job_id"])
const axios = require('axios').default;

const config = {
    method: 'post',
    url: 'https://api.dolby.com/media/analyze/speech',
    headers: {
        'x-api-key': process.env.DOLBYIO_API_KEY,
        'Content-Type': 'application/json',
        'Accept': 'application/json'
    },
    data: {
        input: process.env.DOLBYIO_INPUT,
        output: 'dlb://out/example-metadata.json'
    }
};

axios(config)
    .then(function (response) {
        console.log(response.data.job_id);
    })
    .catch(function (error) {
        console.log(error);
    });
curl -X POST "https://api.dolby.com/media/analyze/speech" \
    --header "x-api-key: $DOLBYIO_API_KEY" \
    --data '{
        "input": $DOLBYIO_INPUT,
        "output": "dlb://out/example-metadata.json"
    }'

The JSON response will include a unique job_id that you'll need to use to check on the status of media analysis.

{"job_id":"a495424-a525-9a9b-34c6-d2a45037aa36"}

4. Check the Results

It will take a few moments for the API to analyze the speech in your file. You'll need to check the status of the job. You can learn more about this in the How It Works section of the Introduction to learn more.

For this GET /media/analyze/speech request you'll need to use the job id returned from the previous step. In these examples, it is specified as an environment variable that you'll need to set or replace in the code samples.

import os
import requests
from pprint import pprint

job_id = os.environ["DOLBYIO_JOB_ID"]

url = "https://api.dolby.com/media/analyze/speech"
headers = {
    "x-api-key": os.environ["DOLBYIO_API_KEY"],
    "Content-Type": "application/json",
    "Accept": "application/json",
}

params = {
    "job_id": job_id,
}

response = requests.get(url, params=params, headers=headers)
response.raise_for_status()
pprint(response.json())
const axios = require('axios').default;

const config = {
    method: 'get',
    url: 'https://api.dolby.com/media/analyze/speech',
    headers: {
        'x-api-key': process.env.DOLBYIO_API_KEY,
        'Content-Type': 'application/json',
        'Accept': 'application/json'
    },
    params: {
        job_id: process.env.DOLBYIO_JOB_ID
    }
};

axios(config)
    .then(function (response) {
        console.log(JSON.stringify(response.data, null, 4));
    })
    .catch(function (error) {
        console.log(error);
    });
curl -X GET "https://api.dolby.com/media/analyze/speech?job_id=$DOLBYIO_JOB_ID" \
    --header "x-api-key: $DOLBYIO_API_KEY"

While the job is still in progress, you will be able to see the status and progress values returned.

{
  "path": "/media/analyze/speech",
  "status": "Running",
  "progress": 42
}

If you re-run and call again after a period of time you'll see the status changes and the output you originally specified will be ready for download.

{
  "path": "/media/analyze/speech",
  "progress": 100,
  "result": {},
  "status": "Success"
}

5. Download Results

Once media analysis is complete, the analysis result file will be PUT in the output location specified when the job was started. If you used the optional dlb://out/airplane.speech_analysis_output.json location you can use the /media/output API to retrieve the file. It takes a url as the only parameter.

import os
import shutil
import requests

output_path = os.environ["OUTPUT_MEDIA_LOCAL_PATH"]

url = "https://api.dolby.com/media/output"
headers = {
    "x-api-key": os.environ["DOLBYIO_API_KEY"],
    "Content-Type": "application/json",
    "Accept": "application/json",
}

args = {
    "url": "dlb://out/example-metadata.json",
}

with requests.get(url, params=args, headers=headers, stream=True) as response:
    response.raise_for_status()
    response.raw.decode_content = True
    print("Downloading from {0} into {1}".format(response.url, output_path))
    with open(output_path, "wb") as output_file:
        shutil.copyfileobj(response.raw, output_file)
const fs = require("fs")
const axios = require("axios").default

const output_path = process.env.OUTPUT_MEDIA_LOCAL_PATH

const config = {
  method: "get",
  url: "https://api.dolby.com/media/output",
  headers: {
    "x-api-key": process.env.DOLBYIO_API_KEY,
    "Content-Type": "application/json",
    "Accept": "application/json",
  },
  responseType: "stream",
  params: {
    url: "dlb://out/example-metadata.json",
  },
}

axios(config)
  .then(function(response) {
    response.data.pipe(fs.createWriteStream(output_path))
    response.data.on("error", function(error) {
      console.log(error)
    })
    response.data.on("end", function() {
      console.log("File downloaded!")
    })
  })
  .catch(function(error) {
    console.log(error)
  })
# You specify the `-L` because the service will redirect you to a cloud storage location. 
# You shouldn't try to retrieve directly from the cloud storage as the location may change, 
# so use **/media/output** with the shortcut instead. 
# The `-O` is to just output the file to your local system with the same filename.

curl  -X GET https://api.dolby.com/media/output?url=dlb://out/example-metadata.json \
      -O -L \
      --header "x-api-key: $DOLBYIO_API_KEY"

Learn More

That's just the start of what you can do with the Analyze Speech API. For more information, these resources may be helpful:

Analyze Speech API Reference


Did this page help you?