NEWDolby Interactivity APIs are now the Dolby.io Communications APIs Learn More >
X

Spatial Audio

🚀

Closed Beta

New functionalities mentioned in this document are a part of the Beta program, as a Closed Beta. The program offers pre-GA Client SDKs that allow customers to evaluate upcoming features before their General Availability. For more information regarding the program, see the Beta Programs document.

What is Spatial Audio?

Spatial Audio allows you to place conference participants spatially in a 3D rendered audio scene and hear the audio from the participants rendered at the given locations. This feature replicates audio conditions similar to real conference room conditions to make virtual meetings more natural and realistic. The application configures the audio scene and then gives the positions for the participants in the conference. The application can also change the position the scene is heard from in 3D space.

Spatial Audio is supported on Dolby Voice Clients and Stereo Opus Clients in Dolby Voice conferences. Mono Opus is not supported and triggers the UnsupportedError if calling the setAudioPositions API.

While spatial audio can be used in many types of application, this article discusses two use cases: standard video conferencing and a 2D top down virtual space.

The spatial audio scene heard by each conference participant can be unique, the positions are not shared between participants. For example, in a video conferencing application, the video tile layouts are usually different for each participant.

Video conferencing

A video conferencing application would use spatial audio to map each participants audio to the same location as their video on the screen. For example, if a participant’s video is shown to the left of the screen, the audio from that participant would be heard from the left. This is known as “Audio/Video congruence”.

The A/V congruent audio scene is set up by defining the rectangle that represents the scene in the application window. Normally this would be the rectangle that the video conference is being shown in. The participants’ positions are then expressed by the positions of their video tiles.

Tiles can be arranged in any way desired by the application developer, though some common examples might include:

  • a grid view of equally sized tiles
  • a grid view of different sized tiles
  • a large main presenter tile with tiles for the rest of the participants in a line underneath
  • a panel, where the tiles are arranged as if the participants were sitting at a panel table facing the audience

2D virtual space

In this use case, the conference is usually audio-only and is displayed as a 2D map with a top-down or isometric presentation. Each participant in the conference is shown on the map as an avatar. The audio for the other participants is on a horizontal plane and appears to come from the position shown for that participant. Participants can move themselves around the space and see others move as well. The audio scene updates as the participants move. The 2D top-down mapping use case can be applied to applications such as 2D games, trade shows, water cooler scenarios, etc.

Limitations

Spatial Audio is not currently supported for listeners, but will be available in an upcoming release.


Did this page help you?