Integrating Individual Spatial Audio

The following process integrates individual spatial audio into an application one step at a time, using a 2D A/V congruent use case to describe this process.

To use the individual spatial audio feature in your application you must:

  • Define what spatial audio means for your application
  • Enable individual spatial audio on joining the conference
  • Configure the spatial environment using the SDK
  • Provide relevant position updates for the spatial audio locations

See Summary for the complete code snippet for integrating individual spatial audio.

Define spatial audio for your application

For any spatial audio application, there are many ways you may want to map the visual scene to where audio is heard from. We recommend the following style of mapping for use in the majority of A/V congruent video conference use cases.

First you need to understand where the video tiles are drawn on the screen. In the following image, you can see the local participant in the center of the screen, highlighted with a purple outline. With A/V congruence, you can hear the audio for all the other participants from positions related to the video layout surrounding the local participant.

The screen must be mapped to an audio space so that the other two participants are heard directly in front of the local participant, and behind and to the right of the local participant.

31383138

Looking at the screen layout, shown in the second image showing this detail, the top left hand side of the screen is (0,0) pixels and the bottom right of the screen is (800,600) pixels with the local participant right in the middle of the screen at (400, 300) pixels. Based on this information, you can configure the spatial environment.

19201920

Enable individual spatial audio on the conference

Once you know how your application looks and what behavior you want for the audio, the next step is to use the SDK to configure the spatial environment to match what the application wants.

To enable spatial audio on the conference, you need to create the conference as a Dolby Voice conference and join with spatial audio and stereo on the downlink.

When joining the conference you need to enable spatial audio by setting the spatialAudio (Web, iOS, Android, C++) flag.

// Create a conference
const createOptions = {
 alias: "conference name",
 params: {
 spatialAudioStyle: "individual"
 }
};
const conference = await VoxeetSDK.conference.create(createOptions);

// Join the Dolby Voice conference using spatial audio and requesting stereo on the downlink
const joinOptions = {
 spatialAudio: true
};
await VoxeetSDK.conference.join(conference, joinOptions);
let options = VTConferenceOptions()
options.alias = "conference name"
options.spatialAudioStyle = .individual
VoxeetSDK.shared.conference.create(options: options, success: {
 conference in

 // Join the conference using spatial audio
 let options = VTJoinOptions()
 options.spatialAudio = true
 VoxeetSDK.shared.conference.join(conference: conference, options: options, success: {
 response in

 // Success
 }, fail: { error in })
}, fail: { error in })
ParamsHolder paramsHolder = new ParamsHolder();
paramsHolder.setSpatialAudioStyle(SpatialAudioStyle.INDIVIDUAL);

ConferenceCreateOptions conferenceCreateOptions = new ConferenceCreateOptions.Builder()
 .setConferenceAlias("conference name")
 .setParamsHolder(paramsHolder)
 .build();

// Create a Dolby Voice conference
VoxeetSDK.conference().create(conferenceCreateOptions)
 .then((ThenPromise<Conference, Conference>) conference -> {
 ConferenceJoinOptions conferenceJoinOptions = new ConferenceJoinOptions.Builder(conference)
 .setSpatialAudio(true)
 .build();

 // Join the Dolby Voice conference using spatial audio
 return VoxeetSDK.conference().join(conferenceJoinOptions);
 })
 .then(conference -> {
 // Success
 })
 .error((error_in) -> {
 // Error
 });
// Create a conference
dolbyio::comms::services::conference::conference_options conference_options{};
conference_options.alias = "conference name";
conference_options.params.spatial_audio_style =
   dolbyio::comms::spatial_audio_style::individual;

// This wait call will block until the asynchronous operation completes
auto conf_info = wait(sdk->conference().create(conference_options));

// Join the Dolby Voice conference using spatial audio
dolbyio::comms::services::conference::join_options join_options{};
join_options.constraints.audio = true;
join_options.constraints.video = false;
join_options.connection.spatial_audio = true;

// The wait call will block until the asynchronous join completes
wait(sdk->conference().join(conf_info, join_options));

Configure the spatial audio scene

You need to tell the audio renderer what units the application is using for its x and y screen coordinates and what directions are forward, right and up. In this example, you need to tell the audio renderer that you want the top of the screen (smaller Y-Axis values) to be heard from in front, and the right of the screen (Larger X-Axis values) to be heard from the right.

Note: Using this environment setting prevents the application from needing to calculate how to transform their view of the world. Instead you tell the audio render what you are using and it automatically adjusts to your applications coordinates.

// Negative Y axis is heard in the forwards direction, so tell the SDK forward has -1 Y
const forward = { x: 0, y: -1, z: 0 };

// Upwards axis is unimportant for this case, you can set it to either Z = +1 or Z -1,
//never provide a Z position
const up = { x: 0, y: 0, z: 1 };

// Positive X axis is heard in the right-hand direction, so tell the SDK right has +1 X
const right = { x: 1, y: 0, z: 0 };
// Negative Y axis is heard in the forwards direction, so tell the SDK forward has -1 Y
let forward = VTSpatialPosition(x: 0, y: -1, z: 0)!

// Upwards axis is unimportant for this case, you can set it to either Z = +1 or Z -1,
//never provide a Z position
let up = VTSpatialPosition(x: 0, y: 0, z: 1)!

// Positive X axis is heard in the right-hand direction, so tell the SDK right has +1 X
let right = VTSpatialPosition(x: 1, y: 0, z: 0)!
// Negative Y axis is heard in the forwards direction, so tell the SDK forward has -1 Y
SpatialPosition forward = new SpatialPosition(0,-1,0);

// Upwards axis is unimportant for this case, you can set it to either Z = +1 or Z -1,
//never provide a Z position
SpatialPosition up = new SpatialPosition(0, 0, 1);

// Positive X axis is heard in the right-hand direction, so tell the SDK right has +1 X
SpatialPosition right = new SpatialPosition(1, 0, 0);
// Negative Y axis is heard in the forwards direction, so tell the SDK forward has -1 Y
dolbyio::comms::spatial_position right{1, 0, 0};

// Upwards axis is unimportant for this case, you can set it to either Z = +1 or Z -1,
// never provide a Z position
dolbyio::comms::spatial_position up{0, 1, 0};

// Positive X axis is heard in the right-hand direction, so tell the SDK right has +1 X
dolbyio::comms::spatial_position forward{0, 0, 1};

Configure the spatial environment scale

With spatial audio, a conference participant’s volume is dependent on how close they are to the local participant. A participant who is one meter away will be at full volume. A participant who is 100 meters away will not be heard. The spatial environment scale allows you to tell the audio renderer how to convert into meters from the units that your application is using to set positions.

If your application has a hearing limit, use that to determine the scale. For example, if the hearing limit in the application is 5000 pixels, the scale would be set so that 5000 pixels equals 100 meters, with the same scale being used for all axis:

const axis_scale = 5000 / 100;
const scale = { x: axis_scale, y: axis_scale, z: axis_scale };
let axis_scale = 5000 / 100
let scale   = VTSpatialScale(x: axis_scale, y: axis_scale, z: axis_scale)!
number axis_scale = 5000 / 100;
 SpatialScale scale = new SpatialScale(axis_scale, axis_scale, axis_scale);
double axis_scale = 5000 / 100;
dolbyio::comms::spatial_scale scale{axis_scale, axis_scale, axis_scale};

For an application that uses A/V congruence, it is normal to set the scale so that everyone sounds loud. The conference display is a 2D rectangle on screen and the application sets the scale so that this display represents an audio space rectangle of 4 meters by 3 meters, with the listener at the center.

The scale for each axis is calculated as:

// Scale for Z axis does not matter as you never provide a Z position, set it to 1
 const scale = { x: display_width_pixels / 4, y: display_height_pixels / 3, z: 1 };
// Scale for Z axis does not matter as you never provide a Z position, set it to 1
let scale   = VTSpatialScale(x:  display_width_pixels / 4, y: display_height_pixels / 3, z: 1)!
// Scale for Z axis does not matter as you never provide a Z position, set it to 1
SpatialScale scale = new SpatialScale(displayWidthPixels / 4, displayHeightPixels / 3, 1);
Not applicable for C++ SDK since the SDK does not support talkers in pixels.

You can call setSpatialEnvironment to configure these parameters.

VoxeetSDK.conference.setSpatialEnvironment(scale, forward, up, right);
VoxeetSDK.shared.conference.setSpatialEnvironment(scale: scale, forward: forward, up: up, right: right)
VoxeetSDK.conference().setSpatialEnvironment(scale, forward, up, right);
// Apply the environment setting to the batch_update structure
dolbyio::comms::spatial_audio_batch_update batch_update;
batch_update.set_spatial_environment(scale, right, up, forward);

// Provide the batched update to the SDK to apply the spatial environment
sdk->conference()
    .update_spatial_audio_configuration(std::move(batch_update))
    .then([]() { std::cout << "Spatial Enviroment Set!\n"; })
    .on_error([](std::exception_ptr&&) {
       std::cerr << "Failed to Set Spatial Environment!";
    });

Set the position audio is heard from

By default the local participant hears audio from the position (0, 0, 0). In this example position (0, 0) is the very top left of the screen and if unmodified, everyone will sound as though they are coming from behind and to the right. To maximize spatial separation in the conference, for greater audio impact, position the listener in the very middle of the screen, surrounded by the other participants. This is illustrated in the image with pixel location (400, 300).

const spatialPosition = {
 x: 400,
 y: 300,
 z: 0,
};

VoxeetSDK.conference.setSpatialPosition(VoxeetSDK.session.participant, spatialPosition);
let spatialPosition = VTSpatialPosition(x: 400, y: 300, z: 0)!

VoxeetSDK.shared.conference.setSpatialPosition(participant: VoxeetSDK.shared.session.participant!, position: spatialPosition)
SpatialPosition spatialPosition = new SpatialPosition(400, 300, 0);

VoxeetSDK.conference().setSpatialPosition(participant, spatialPosition);
// Set the spatial location for the local participant
dolbyio::comms::spatial_audio_batch_update batch_update;
dolbyio::comms::spatial_position position{1, 1, 1};

batch_update.set_spatial_position(session_info.participant_id.value(),
                                       position);
sdk->conference()
    .update_spatial_audio_configuration(std::move(batch_update))
    .then([]() { std::cout << "Spatial Position Set!\n"; })
    .on_error([](std::exception_ptr&&) {
      std::cerr << "Failed to Set Spatial Position!";
    });

To provide the screen position for the middle of the video tile of all participants, we recommend:

  • Always applying positions on updated video layout
  • Associating the participant object and the video node somehow (in a map for example or using identifiers) as they join the conference
[...VoxeetSDK.conference.participants.values()].map(participant => {
 
  // Get the video tile for the participant
    const videoTile = document.getElementById(`video-${participant.id}`);
});

 // Get the position of the video tile
 const elementPosition = videoTile.getBoundingClientRect();

 // Set the participant position to the middle of their video tile
 VoxeetSDK.conference.setSpatialPosition(participant, {
 x: elementPosition.x + (elementPosition.width / 2),
 y: elementPosition.y + (elementPosition.height / 2),
 z: 0
 });
});
for participant in VoxeetSDK.shared.conference.current!.participants {
 // Get participant position x and y
 let x = 0.0
 let y = 0.0
 let z = 0.0
 let partPosition = VTSpatialPosition(x: x, y: y, z: z)!

 VoxeetSDK.shared.conference.setSpatialPosition(participant: participant, position: partPosition)
}
for (Participant participant : VoxeetSDK.conference().getParticipants()) {
 // Get participant position x and y
 SpatialPosition partPosition = new SpatialPosition(x, y, 0);
 
 VoxeetSDK.conference().setSpatialPosition(participant, partPosition);
}
// Set position for all remote participant in list
dolbyio::comms::spatial_audio_batch_update batch_update;
auto participant_list =
    wait(sdk->conference().get_current_conference()).participants;
for (const auto& participant : participant_list) {
  if (participant.first.compare(session_info.participant_id.value()) != 0) {
    dolbyio::comms::spatial_position position{1, 1, 1};
    batch_update.set_spatial_position(participant.first,
                                      position);
  }
}
sdk->conference()
    .update_spatial_audio_configuration(std::move(batch_update))
    .then(
        []() { std::cout << "Remote Participants Spatial Positions Set!\n"; })
    .on_error([](std::exception_ptr&&) {
      std::cerr << "Failed to Set Remote Participants Spatial Positions!";
    });
}

Set the position where remote participants speak from

Finally, you need to set positions for all the other participants in the conference. In an individual spatial conference all participant are muted at the beginning of a conference. This is to prevent them speaking from an incorrect direction at the beginning of the conference and then jumping to the correct location after the position is set. If you do not set any positions, you will not be able to hear those participants when they talk at the beginning of a conference. To un-mute the participants you must provide a position for them.

For example on the participantAdded (Web, iOS, Android, C++) event, you can set a fixed position for the participant.

Additionally, you can set a fixed position for the participant using the participantUpdated (Web, iOS, Android, C++) event.

You should now have a working A/V congruent spatial conference.

Summary

For A/V Congruence, in most scenarios you can just use a standard configuration based on the information discussed above. All these steps combined might contain the following code snippet:

// Code to use for A/V congruence without needing the tutorial

// Create a conference
const createOptions = {
 alias: "conference name",
 params: {
 spatialAudioStyle: "individual"
 }
};
const conference = await VoxeetSDK.conference.create(createOptions);

// Join the conference using spatial audio and requesting stereo on the downlink
const joinOptions = {
 spatialAudio: true
};
await VoxeetSDK.conference.join(conference, joinOptions);


// Prepare the audio scene
// window.innerWidth and window.innerHeight give me the dimensions of the window
const scale = { x: window.innerWidth / 4, y: window.innerHeight / 3, z: 1 };
const forward = { x: 0, y: -1, z: 0 };
const up = { x: 0, y: 0, z: 1 };
const right = { x: 1, y: 0, z: 0 };

VoxeetSDK.conference.setSpatialEnvironment(scale, forward, up, right);

 
[...VoxeetSDK.conference.participants.values()].map(participant => {
 
 // Get the video tile for the participant
 const videoTile = document.getElementById(`video-${participant.id}`);
 });

 // Get the position of the video tile
 const elementPosition = videoTile.getBoundingClientRect();

 // Set the participant position to the middle of their video tile
 VoxeetSDK.conference.setSpatialPosition(participant, {
 x: elementPosition.x + (elementPosition.width / 2),
 y: elementPosition.y + (elementPosition.height / 2),
 z: 0
 });
});
// Code to use for A/V congruence without needing the tutorial

// Create a conference
let options = VTConferenceOptions()
options.alias = "conference name"
options.spatialAudioStyle = .individual
VoxeetSDK.shared.conference.create(options: options, success: {
 conference in

 // Join the conference using spatial audio
 let options = VTJoinOptions()
 options.spatialAudio = true
 VoxeetSDK.shared.conference.join(conference: conference, options: options, success: {
 response in

 // Prepare the audio scene
 let axis_scale = 5000.0 / 100.0
 let scale = VTSpatialScale(x: axis_scale, y: axis_scale, z: axis_scale)!
 let forward = VTSpatialPosition(x: 0, y: -1, z: 0)!
 let up = VTSpatialPosition(x: 0, y: 0, z: 1)!
 let right = VTSpatialPosition(x: 1, y: 0, z: 0)!

 VoxeetSDK.shared.conference.setSpatialEnvironment(scale: scale, forward: forward, up: up, right: right)

 for participant in VoxeetSDK.shared.conference.current!.participants {
 // Get participant position x and y
 let x = 3.0
 let y = 6.0
 let z = 0.0
 let partPosition = VTSpatialPosition(x: x, y: y, z: z)!

 VoxeetSDK.shared.conference.setSpatialPosition(participant: participant, position: partPosition)
 }

 }, fail: { error in })
}, fail: { error in })
// Code to use for A/V congruence without needing the tutorial

ParamsHolder paramsHolder = new ParamsHolder();
paramsHolder.setSpatialAudioStyle(SpatialAudioStyle.INDIVIDUAL);

ConferenceCreateOptions conferenceCreateOptions = new ConferenceCreateOptions.Builder()
 .setConferenceAlias("conference name")
 .setParamsHolder(paramsHolder)
 .build();

// Create a conference
VoxeetSDK.conference().create(conferenceCreateOptions)
 .then((ThenPromise<Conference, Conference>) conference -> {
 ConferenceJoinOptions conferenceJoinOptions = new ConferenceJoinOptions.Builder(conference)
 .setSpatialAudio(true)
 .build();

 // Join the conference using spatial audio
 return VoxeetSDK.conference().join(conferenceJoinOptions);
 })
 .then(conference -> {
 // Success
 })
 .error((error_in) -> {
 // Error
 });


// Prepare the audio scene
// Get the resolution of the screen
// import android.util.DisplayMetrics;
DisplayMetrics metrics = new DisplayMetrics();
getWindowManager().getDefaultDisplay().getMetrics(metrics);

SpatialScale scale = new SpatialScale(metrics.widthPixels / 4, metrics.heightPixels / 3, 1);
SpatialPosition right = new SpatialPosition(1, 0, 0);
SpatialPosition forward = new SpatialPosition(0,-1,0);
SpatialPosition up = new SpatialPosition(0, 0, 1);

VoxeetSDK.conference().setSpatialEnvironment(scale, forward, up, right);


for (Participant participant : VoxeetSDK.conference().getParticipants()) {
 // Get participant position x and y
 SpatialPosition partPosition = new SpatialPosition(x, y, 0);
 
 VoxeetSDK.conference().setSpatialPosition(participant, partPosition);
}
// Create Shared Spatial Audio Conference  
dolbyio::comms::services::conference::conference_options conference_options{};
conference_options.alias = "conference name";
conference_options.params.spatial_audio_style =
      dolbyio::comms::spatial_audio_style::individual;

// The wait call will block until the asynchronous operation completes
auto conf_info = wait(sdk->conference().create(conference_options));

// Join the Shared Spatial Conference
dolbyio::comms::services::conference::join_options join_options{};
join_options.constraints.audio = true;
join_options.constraints.video = false;
join_options.connection.spatial_audio = true;

// The wait call will block until the asynchronous operation completes
wait(sdk->conference().join(conf_info, join_options));

// Configure the spatial audio scene
dolbyio::comms::spatial_position right{1, 0, 0};
dolbyio::comms::spatial_position up{0, 1, 0};
dolbyio::comms::spatial_position forward{0, 0, 1};

// Configure the spatial environment scale
double axis_scale = 5000 / 100;
dolbyio::comms::spatial_scale scale{axis_scale, axis_scale, axis_scale};

// Set the Spatial Environment
dolbyio::comms::spatial_audio_batch_update batch_update;
batch_update.set_spatial_environment(scale, right, up, forward);

// Set the local participant’s position
dolbyio::comms::spatial_position position{1, 1, 1};
batch_update.set_spatial_position(session_info.participant_id.value(),
                                  position);

// As an example we set the spatial position for first remote participant in list
auto participant_list =
    wait(sdk->conference().get_current_conference()).participants;
for (const auto& participant : participant_list) {
  if (participant.first.compare(session_info.participant_id.value()) != 0) {
    dolbyio::comms::spatial_position position{1, 1, 1};
    batch_update.set_spatial_position(participant.first,
                                      position);
  }
}

// Update the positions and environment asynchronously
sdk->conference()
   .update_spatial_audio_configuration(std::move(batch_update))
   .then([]() { std::cout << "Spatial Environment and Local Position Set!\n"; })
   .on_error([](std::exception_ptr&&) {
     std::cerr << "Failed to Set Spatial Position and Environment!";
   });

Did this page help you?