Introduction to Communications APIs

The Communications APIs provide an award-winning platform for unified real-time communication backed by decades of expertise in the science of sight and sound. Combine Voice Calls, Video Calls, and Call Recordings to build integrated solutions for your own applications.

With the Communications APIs you get:

✓ Multi-party voice and video calls
✓ Voice processing with dialog leveling and activity detection
✓ Static noise and echo suppression
✓ Spatial Chat
✓ Optimized device management with networking resilience
✓ Customizable streaming and recording layouts
✓ Metadata and connection statistics from conferences
✓ Client SDKs for web, mobile, desktop, and server applications

Deliver high-quality collaboration experiences end-to-end with features that increase engagement, improve attention, and reduce fatigue when communicating in a wide variety of use cases.

Get Started

To get started quickly, complete a hands-on tutorial with a Client SDK, a Virtual World plugin, or Real-time Media Extensions for your preferred development environment.


Want to build a customized video call solution?

The UIKit for React provides customizable UI components for rapid implementation of the Communications APIs. For an example of how the UIkit can be implemented, see the Video Call Kickstart App.

How It Works

The Communications APIs are offered as a Communications Platform as a Service (CPaaS).

Communications Platform

Real-Time Communications (RTC) is managed by the platform over a WebSocket connection between each instance of a client app and the platform media servers. This connection uses UDP or TCP as supported to maintain client-server synchronization and provide a better end-user experience in contrast to peer-to-peer solutions.

The platform provides a Selective Forwarding Unit (SFU) for video that receives media streams from each individual participant. The media server then decides how to best mix and combine the streams in order to route video in HD (720p30) as H.264 or VP8 back to the other participants supporting both 1:1 or multi-party conferences.

The platform provides a Multi-point Conferencing Unit (MCU) for audio that is powered by Dolby Voice, an award-winning communications technology. Audio is delivered as an Opus stream or with the Dolby Voice Codec (DVC) depending on the client platform capabilities that best provide improvements in voice quality and clarity.

With the REST APIs, applications can use Real-Time Messaging Protocol (RTMP) to achieve scale through streaming protocols.

Dolby Voice and Spatial Scenes

Dolby Voice captures how things sound in natural environments and leverages signal processing techniques to enhance conversations across web, desktop, and mobile applications. This includes a wide range of sound improvements like suppressing unwanted background noises, leveling mic volumes, and improving dialog intelligibility.

✓ Levels microphones in multi-party calls, balancing speaking and listening volumes across all participants for natural sounding conversation.
✓ Removes unwanted background noise from calls by suppressing distracting sounds like barking dogs, street noise, or keyboard clicks.
✓ Cleans up audio stream by removing conference output from computer mic, ensuring others don't hear themselves echoing back.
✓ Layout audio to match on-screen location of participants independent from each other to correlate the source of sound with the layout on screen.
✓ Dolby managed end-to-end audio experience from microphone capture, enhancing with Dolby Voice, and delivering it with low latency.

Learn more in the Dolby Voice article.

Device Database

Dolby has an extensive device databases when it comes to understanding the hardware capabilities of microphones and cameras. The Communications APIs platform leverages this deep knowledge of user media devices to provide wide compatibility and performance. This enables functionality such as device selection, muting and unmuting the microphone, enabling or disabling the video camera, etc.

✓ Media device controls to select available cameras and microphones
✓ Optimized for mic and camera capabilities end-to-end

Content Sharing

Content sharing enables users to present what is on their screen to all participants in the conference. Content sharing is important for collaboration, as it allows for presentations, peer review, etc.

✓ Screen Sharing
✓ File Sharing
✓ Video Sharing

Call Recordings

Recordings provide flexibility to create a permanent record of a conference or broadcast. These recordings can be captured live, generated on-demand, or captured from webhook events. You can customize the layout of mixer recordings specific for your user-defined applications.

✓ Audio and/or Video
✓ Individual Participants or Combined Sessions


To keep multiple client applications in sync, the Communications APIs provide message broadcasting to send data to all conference participants. This could be JSON, XML, or a simple string that triggers behavior on other clients.

Streaming / Broadcasting

To reach larger audiences the platform supports streaming conferences over RTMP, as well as Real-time Streaming (formerly Millicast) to reach massive audiences with less than 500ms delay around the globe. You can start a stream using a REST API call and monitor the status with Webhooks.

Additional Resources