Integrating Spatial Audio

The following process integrates Spatial Audio into an application one step at a time, using a 2D A/V congruent use case to describe this process.

To use the spatial audio feature in your application you must:

  • Define what spatial audio means for your application
  • Enable spatial audio on joining the conference
  • Configure the spatial environment using the SDK
  • Provide relevant position updates for the spatial audio locations

See Summary for the complete code snippet for integrating Spatial Audio.

Define spatial audio for your application

For any spatial audio application, there are many ways you may want to map the visual scene to where audio is heard from. We recommend the following style of mapping for use in the majority of A/V congruent video conference use cases.

First you need to understand where the video tiles are drawn on the screen. In the following image, you can see the local participant in the center of the screen, highlighted with a purple outline. With A/V congruence, you can hear the audio for all the other participants from positions related to the video layout surrounding the local participant.

The screen must be mapped to an audio space so that the other two participants are heard directly in front of the local participant, and behind and to the right of the local participant.

Looking at the screen layout, shown in the second image showing this detail, the top left hand side of the screen is (0,0) pixels and the bottom right of the screen is (800,600) pixels with the local participant right in the middle of the screen at (400, 300) pixels. Based on this information, you can configure the spatial environment.

Enable Spatial Audio on the conference

Once you know how your application looks and what behavior you want for the audio, the next step is to use the SDK to configure the spatial environment to match what the application wants.

To enable spatial audio on the conference we need to create the conference as a Dolby Voice conference and join with spatial audio and stereo on the downlink.

When joining the conference you need to enable spatial audio by setting the spatialAudio flag in the JoinOptions in the SDK for Web, setting the spatialAudio flag in the JoinOptions in the SDK for iOS, and calling setSpatialAudio in the SDK for Android.

// Create a Dolby Voice conference
const createOptions = {
 alias: "spatial",
 params: {
 dolbyVoice: true
 }
};
const conference = await VoxeetSDK.conference.create(createOptions);

// Join the Dolby Voice conference using spatial audio and requesting stereo on the downlink
const joinOptions = {
 preferRecvMono: false,
 spatialAudio: true
};
await VoxeetSDK.conference.join(conference, joinOptions);
let options = VTConferenceOptions()
options.params.dolbyVoice = true
options.alias = "spatial"
VoxeetSDK.shared.conference.create(options: options, success: {
 conference in

 // Join the Dolby Voice conference using spatial audio
 let options = VTJoinOptions()
 options.spatialAudio = true
 VoxeetSDK.shared.conference.join(conference: conference, options: options, success: {
 response in

 // Success
 }, fail: { error in })
}, fail: { error in })
ParamsHolder paramsHolder = new ParamsHolder();
paramsHolder.setDolbyVoice(true);

ConferenceCreateOptions conferenceCreateOptions = new ConferenceCreateOptions.Builder()
 .setConferenceAlias("spatial")
 .setParamsHolder(paramsHolder)
 .build();

// Create a Dolby Voice conference
VoxeetSDK.conference().create(conferenceCreateOptions)
 .then((ThenPromise<Conference, Conference>) conference -> {
 ConferenceJoinOptions conferenceJoinOptions = new ConferenceJoinOptions.Builder(conference)
 .setSpatialAudio(true)
 .build();

 // Join the Dolby Voice conference using spatial audio
 return VoxeetSDK.conference().join(conferenceJoinOptions);
 })
 .then(conference -> {
 // Success
 })
 .error((error_in) -> {
 // Error
 });

Configure the spatial audio scene

You need to tell the audio renderer what units the application is using for its x and y screen coordinates and what directions are forward, right and up. In this example, you need to tell the audio renderer that you want the top of the screen (smaller Y-Axis values) to be heard from in front, and the right of the screen (Larger X-Axis values) to be heard from the right.

Note: Using this environment setting prevents the application from needing to calculate how to transform their view of the world. Instead you tell the audio render what you are using and it automatically adjusts to your applications coordinates.

// Negative Y axis is heard in the forwards direction, so tell the SDK forward has -1 Y
const forward = { x: 0, y: -1, z: 0 };

// Upwards axis is unimportant for this case, we can set it to either Z = +1 or Z -1,
// we never provide a Z position
const up = { x: 0, y: 0, z: 1 };

// Positive X axis is heard in the right-hand direction, so tell the SDK right has +1 X
const right = { x: 1, y: 0, z: 0 };
// Negative Y axis is heard in the forwards direction, so tell the SDK forward has -1 Y
let forward = VTSpatialPosition(x: 0, y: -1, z: 0)!

// Upwards axis is unimportant for this case, we can set it to either Z = +1 or Z -1,
// we never provide a Z position
let up = VTSpatialPosition(x: 0, y: 0, z: 1)!

// Positive X axis is heard in the right-hand direction, so tell the SDK right has +1 X
let right = VTSpatialPosition(x: 1, y: 0, z: 0)!
// Negative Y axis is heard in the forwards direction, so tell the SDK forward has -1 Y
SpatialPosition forward = new SpatialPosition(0,-1,0);

// Upwards axis is unimportant for this case, we can set it to either Z = +1 or Z -1,
// we never provide a Z position
SpatialPosition up = new SpatialPosition(0, 0, 1);

// Positive X axis is heard in the right-hand direction, so tell the SDK right has +1 X
SpatialPosition right = new SpatialPosition(1, 0, 0);

Configure the spatial environment scale

With Spatial Audio, a conference participant’s volume is dependent on how close they are to the local participant. A participant who is one meter away will be at full volume. A participant who is 100 meters away will not be heard. The spatial environment scale allows you to tell the audio renderer how to convert into meters from the units that your application is using to set positions.

If your application has a hearing limit, use that to determine the scale. For example, if the hearing limit in the application is 5000 pixels, the scale would be set so that 5000 pixels equals 100 meters, with the same scale being used for all axis:

const axis_scale = 5000 / 100;
const scale = { x: axis_scale, y: axis_scale, z: axis_scale };
let axis_scale = 5000 / 100
let scale   = VTSpatialScale(x: axis_scale, y: axis_scale, z: axis_scale)!
number axis_scale = 5000 / 100;
 SpatialScale scale = new SpatialScale(axis_scale, axis_scale, axis_scale);

For an application that uses A/V congruence, it is normal to set the scale so that everyone sounds loud. The conference display is a 2D rectangle on screen and the application sets the scale so that this display represents an audio space rectangle of 4 meters by 3 meters, with the listener at the center.

The scale for each axis is calculated as:

// scale for Z axis doesn't matter as we never provide a Z position, set it to 1
 const scale = { x: display_width_pixels / 4, y: display_height_pixels / 3, z: 1 };
// scale for Z axis doesn't matter as we never provide a Z position, set it to 1
let scale   = VTSpatialScale(x:  display_width_pixels / 4, y: display_height_pixels / 3, z: 1)!
// scale for Z axis doesn't matter as we never provide a Z position, set it to 1
SpatialScale scale = new SpatialScale(displayWidthPixels / 4, displayHeightPixels / 3, 1);

You can call setSpatialEnvironment to configure these parameters.

VoxeetSDK.conference.setSpatialEnvironment(scale, forward, up, right);
VoxeetSDK.shared.conference.setSpatialEnvironment(scale: scale, forward: forward, up: up, right: right)
VoxeetSDK.conference().setSpatialEnvironment(scale, forward, up, right);

Set the position audio is heard from

By default the local participant hears audio from the position (0, 0, 0). In this example position (0, 0) is the very top left of the screen and if unmodified, everyone will sound as though they are coming from behind and to the right. To maximize spatial separation in the conference, for greater audio impact, position the listener in the very middle of the screen, surrounded by the other participants. This is illustrated in the image with pixel location (400, 300).

const spatialPosition = {
 x: 400,
 y: 300,
 z: 0,
};

VoxeetSDK.conference.setSpatialPosition(VoxeetSDK.session.participant, spatialPosition);
let spatialPosition = VTSpatialPosition(x: 400, y: 300, z: 0)!

VoxeetSDK.shared.conference.setSpatialPosition(participant: VoxeetSDK.shared.session.participant!, position: spatialPosition)
SpatialPosition spatialPosition = new SpatialPosition(400, 300, 0);

VoxeetSDK.conference().setSpatialPosition(participant, spatialPosition);

To provide the screen position for the middle of the video tile of all participants, we recommend:

  • Always applying positions on updated video layout
  • Associating the participant object and the video node somehow (in a map for example or using identifiers) as they join the conference
[...VoxeetSDK.conference.participants].map((val) => {
 const participant = val[1];

 // Get the video tile for the participant
 const videoTile = document.getElementById(`video-${participant.id}`);

 // Get the position of the video tile
 const elementPosition = videoTile.getBoundingClientRect();

 // Set the participant position to the middle of their video tile
 VoxeetSDK.conference.setSpatialPosition(participant, {
 x: elementPosition.x + (elementPosition.width / 2),
 y: elementPosition.y + (elementPosition.height / 2),
 z: 0
 });
});
for participant in VoxeetSDK.shared.conference.current!.participants {
 // Get participant position x and y
 let x = 0.0
 let y = 0.0
 let z = 0.0
 let partPosition = VTSpatialPosition(x: x, y: y, z: z)!

 VoxeetSDK.shared.conference.setSpatialPosition(participant: participant, position: partPosition)
}
for (Participant participant : VoxeetSDK.conference().getParticipants()) {
 // Get participant position x and y
 SpatialPosition partPosition = new SpatialPosition(x, y, 0);
 
 VoxeetSDK.conference().setSpatialPosition(participant, partPosition);
}

Set the position where remote participants speak from

Finally, you need to set positions for all the other participants in the conference. In a spatial conference all participant are muted at the beginning of a conference. This is to prevent them speaking from an incorrect direction at the beginning of the conference and then jumping to the correct location after the position is set. If you do not set any positions, you will not be able to hear those participants when they talk at the beginning of a conference. To un-mute the participants you must provide a position for them.

For example on the participantAdded on SDK for Web, participantAdded on SDK for iOS, and ParticipantAddedEvent on SDK for Android, you can set a fixed position for the participant.

Additionally, you can set a fixed position for the participant using the participantUpdated on SDK for Web, participantUpdated on SDK for iOS, and ParticipantUpdatedEvent on SDK for Android.

You should now have a working A/V congruent spatial conference.

Summary

For A/V Congruence, in most scenarios you can just use a standard configuration based on the information discussed above. All these steps combined might contain the following code snippet:

// Code to use for A/V congruence without needing the tutorial

// Create a Dolby Voice conference
const createOptions = {
 alias: "spatial",
 params: {
 dolbyVoice: true
 }
};
const conference = await VoxeetSDK.conference.create(createOptions);

// Join the Dolby Voice conference using spatial audio and requesting stereo on the downlink
const joinOptions = {
 preferRecvMono: false,
 spatialAudio: true
};
await VoxeetSDK.conference.join(conference, joinOptions);


// Prepare the audio scene
// window.innerWidth and window.innerHeight give me the dimensions of the window
const scale = { x: window.innerWidth / 4, y: window.innerHeight / 3, z: 1 };
const forward = { x: 0, y: -1, z: 0 };
const up = { x: 0, y: 0, z: 1 };
const right = { x: 1, y: 0, z: 0 };

VoxeetSDK.conference.setSpatialEnvironment(scale, forward, up, right);

 
[...VoxeetSDK.conference.participants].map((val) => {
 const participant = val[1];

 // Get the video tile for the participant
 const videoTile = document.getElementById(`video-${participant.id}`);

 // Get the position of the video tile
 const elementPosition = videoTile.getBoundingClientRect();

 // Set the participant position to the middle of their video tile
 VoxeetSDK.conference.setSpatialPosition(participant, {
 x: elementPosition.x + (elementPosition.width / 2),
 y: elementPosition.y + (elementPosition.height / 2),
 z: 0
 });
});
// Code to use for A/V congruence without needing the tutorial

// Create a Dolby Voice conference
let options = VTConferenceOptions()
options.params.dolbyVoice = true
options.alias = "spatial"
VoxeetSDK.shared.conference.create(options: options, success: {
 conference in

 // Join the Dolby Voice conference using spatial audio
 let options = VTJoinOptions()
 options.spatialAudio = true
 VoxeetSDK.shared.conference.join(conference: conference, options: options, success: {
 response in

 // Prepare the audio scene
 let axis_scale = 5000.0 / 100.0
 let scale = VTSpatialScale(x: axis_scale, y: axis_scale, z: axis_scale)!
 let forward = VTSpatialPosition(x: 0, y: -1, z: 0)!
 let up = VTSpatialPosition(x: 0, y: 0, z: 1)!
 let right = VTSpatialPosition(x: 1, y: 0, z: 0)!

 VoxeetSDK.shared.conference.setSpatialEnvironment(scale: scale, forward: forward, up: up, right: right)

 for participant in VoxeetSDK.shared.conference.current!.participants {
 // Get participant position x and y
 let x = 3.0
 let y = 6.0
 let z = 0.0
 let partPosition = VTSpatialPosition(x: x, y: y, z: z)!

 VoxeetSDK.shared.conference.setSpatialPosition(participant: participant, position: partPosition)
 }

 }, fail: { error in })
}, fail: { error in })
// Code to use for A/V congruence without needing the tutorial

ParamsHolder paramsHolder = new ParamsHolder();
paramsHolder.setDolbyVoice(true);

ConferenceCreateOptions conferenceCreateOptions = new ConferenceCreateOptions.Builder()
 .setConferenceAlias("spatial")
 .setParamsHolder(paramsHolder)
 .build();

// Create a Dolby Voice conference
VoxeetSDK.conference().create(conferenceCreateOptions)
 .then((ThenPromise<Conference, Conference>) conference -> {
 ConferenceJoinOptions conferenceJoinOptions = new ConferenceJoinOptions.Builder(conference)
 .setSpatialAudio(true)
 .build();

 // Join the Dolby Voice conference using spatial audio
 return VoxeetSDK.conference().join(conferenceJoinOptions);
 })
 .then(conference -> {
 // Success
 })
 .error((error_in) -> {
 // Error
 });


// Prepare the audio scene
// Get the resolution of the screen
// import android.util.DisplayMetrics;
DisplayMetrics metrics = new DisplayMetrics();
getWindowManager().getDefaultDisplay().getMetrics(metrics);

SpatialScale scale = new SpatialScale(metrics.widthPixels / 4, metrics.heightPixels / 3, 1);
SpatialPosition right = new SpatialPosition(1, 0, 0);
SpatialPosition forward = new SpatialPosition(0,-1,0);
SpatialPosition up = new SpatialPosition(0, 0, 1);

VoxeetSDK.conference().setSpatialEnvironment(scale, forward, up, right);


for (Participant participant : VoxeetSDK.conference().getParticipants()) {
 // Get participant position x and y
 SpatialPosition partPosition = new SpatialPosition(x, y, 0);
 
 VoxeetSDK.conference().setSpatialPosition(participant, partPosition);
}

Did this page help you?