The Media Processing Developer Hub

Welcome to the Media Processing developer hub. You'll find comprehensive guides and documentation to help you start working with Media Processing as quickly as possible, as well as support if you get stuck. Let's jump right in!

Get Started    

Introduction to Media Processing

Media Processing APIs Media Processing APIs were created to make your audio better, at scale. To deliver great audio, you need the tools and workflow automation to address your growing content libraries and meet consumer expectations. Whether your content is noisy, isn’t the right volume, or just doesn’t “feel right”, we’re here to help. Our APIs analyze your audio, figure out how to optimally enhance it, and apply just the right amount of processing to give you a professional, natural sound.

Media Enhance

Do you need to:

  • make your audio sound better without being an audio expert?
  • make listening easier by removing background noise and boosting speech?
  • improve speech quality by balancing tone and removing harshness?
  • conform your media to meet broadcasting platform recommendations and standards?

These sorts of requirements can be achieved with the Media Enhance API. If you are building an application where users expect high quality sonic clarity, this API is a valuable tool to use in web and mobile applications at cloud-scale.

See the Quick Start to Enhancing Media to get started.

Media Diagnose

Do you need to know:

  • how to measure for overall audio quality?
  • if there are specific problems such as silent channels?
  • general media information such as framerate and bitrate?
  • the percentage of speech or music?

Those sorts of questions and more can be answered with the Media Diagnose API. If you want to a quick summary of your media including insight into potential problems then try the Diagnose API.

See the Quick Start to Diagnosing Media to get started.

Media Analyze

Do you need to know:

  • what type of media do I have?
  • will media platforms accept my media?
  • is there clipping or other audio noise artifacts?
  • how much of the media is speech, music, or silence?

These sorts of questions and more can be answered with the Media Analyze API. If you are building an application that accepts user-generated content or just have a large collection of media to understand at cloud-scale, this API is a valuable tool to have.

See the Quick Start to Analyzing Media to get started.

Media Speech Analytics

Do you need to know:

  • the number of talkers in your media and when they are talking?
  • loudness of each talker so that you can loudness correct if needed?
  • quality score of each talker to identify if a talker's setup has a systemic problem?
  • useful talker metrics like talk-listen-ratio?

These sorts of questions can be answered by the Speech Analytics API. This API is focused on Speech Analytics and is a valuable tool to get insights about the speech in your media. The Speech Analytics API is a specialized API targeted towards media with dominant speech content. The Media Analyze API may be used to identify if your media has dominant speech before calling the Speech Analytics API

See the Quick Start to Analyzing Speech to get started.

How It Works

The Media Processing APIs typically follow a similar pattern.

  1. Use your API keys to authenticate
  2. Ensure your media is shared to be readable and writable by our services
  3. Make an asynchronous API call

Let's look at these steps.


To use the Media Processing APIs, you need to authorize your application's requests. There are two approaches for authentication:

  1. API Key Authentication
  2. OAuth Bearer Token Authentication

See the Authentication guide for additional detail on these methods.

Share Your Media

To process your media, we need to be able to read and write it. There are a number of ways to achieve this:

  • Use your existing cloud storage such as AWS or GCP
  • Use our temporary cloud storage

See the Media Input and Output guide for examples of how to do this.

Make Asynchronous Requests

To process media, you first start a processing job and then need to wait for that processing job to complete. There are two approaches to handling this within your applications.


This approach has a few steps:

  1. POST to a media endpoint to start processing
  2. GET to the same endpoint to check progress
  3. Repeat step 2 until the job is complete

This is a common pattern called polling where GET requests are repeated while waiting on the returned status. The expected status values include:

  • Success - this status indicates the result is ready
  • Running - your media is being processed, check back again soon
  • Pending - your media is waiting for an available resource to run it
  • Failed - there was a problem and you'll see an error with some additional notes about what the cause might be

You can run this as frequently as desired to check on the status of the job and inspect the progress value.


As an alternative to polling, you can receive a notification when a job is complete. This can be specified at the time of submission as a one-time callback or registered with the platform as a webhook to be fired for every job.

The Webhooks and Callbacks platform guide provides additional details about how to setup and receive these notifications.

Processing Time

The more advanced algorithms used in some cases need some time to work their magic.

For example, in the case of the Media Enhance API the processing time may be more or less than the duration of the media itself depending on its length:

  • a 60 second input file may take 80 seconds to complete
  • a 5 minute input file may take 3 minutes and 30 seconds to complete

Our researchers are always working to make processing algorithms more efficient and decrease processing time overall.


If all goes well you will get a status of Success.

Each Media Processing API may return results in a different way. Some examples:

Media EnhanceA media file is written to the location specified as the output in the initial request.
Media AnalyzeJSON data is written to the location specified as the output in the initial request.
Media DiagnoseJSON data is returned in response to your GET request directly.

You should check the API Reference for any services you intend to call to understand how results are returned.

Updated 29 days ago

Introduction to Media Processing

Suggested Edits are limited on API Reference Pages

You can only suggest edits to Markdown body content, but not to the API spec.