Frame Metadata

Insert Real-time Metadata Per Frame When Broadcasting

There are use cases where in addition to an audio and video stream you need to send custom data that needs to be frame-accurate when building applications. This may be data generated through video processing for the position of objects or dynamic rendering of objects that create overlays within a frame.

Third-party messaging services are available that you can use to exchange real-time data but these solutions will require careful synchronization due to differences in latency across networks. To avoid using an independent solution that introduces that dependency, you can broadcast metadata to clients viewers by embedding it in the WebRTC stream itself.

Publishing Metadata

With WebRTC there are capabilities to send metadata in a data channel that exchanges arbitrary data, but that may not be guaranteed to be delivered synchronized with the media stream. As an alternative, we've enabled a feature that allows you to embed metadata within the frame.

  • Data is transported as raw bytes so can be implemented as any arbitrary format (ie. XML, JSON, etc.)
  • Data size is not limited but increases bandwidth requirements and latency so recommend small payloads

This is a real-time transport protocol level feature within WebRTC so is independent from any constraints on the codec, bitrate, and other audio and video encoding settings.

How-to Publish Metadata for the Web

Using the Web SDK you can embed metadata with the frame.

1. Publishing Metadata for Broadcast

To write metadata to the stream is accomplished by using the underlying TransformStream Web API. Chrome is the only browser that currently has this WebRTC support.

// This solution is limited to Chrome browsers

// Get the RTMP senders for video track. In this example we have only added one video
// track so you use the first sender for the encoded stream.
const senders = rtcpeer.getSenders().filter((elt) => elt.track.kind === "video");
const sender = senders[0];
const encoded_stream = sender.createEncodedStreams();
 
// Create TransformStream object for transforming the stream
const transformer = new TransformStream({
    async transform(frame, controller) {
        // Metadata to send can be any arbitrary data or a string
        const message = "10";
        
        // Data should be started with four bytes to signal the upcoming metadata at end of frame
        const start = [ 0xCA, 0xFE, 0xBA, 0xBE ];
        let data = [ ...start, ...[...message].map(c => c.charCodeAt(0)), 0, 0, 0, message.length ];
 
        // Create DataView from Array buffer
        const frame_length = frame.data.byteLength;
        let buffer = new ArrayBuffer(frame_length + data.length);
        let view_buffer = new DataView(buffer);
        let view_frame = new DataView(frame.data);         
 
        // Copy old frame buffer to new frame buffer and then append the metadata
        // at the end of the buffer
        for(let i = 0; i < frame_length; ++i) {
            view_buffer.setUint8(i, view_frame.getUint8(i));
        }
        
        data.forEach((elt, ind) => view_buffer.setUint8(frame_length + ind, elt));
      
        // Set the new frame buffer
        frame.data = buffer;
             
        // Send the frame
        controller.enqueue(frame);
    },
});

encoded_stream.readable.pipeThrough(transformer)
    .pipeTo(encoded_stream.writable);

2. Extracting Metadata for Playback

To read metadata from the stream during playback is done in a similar manner to when it was published using the TransformStream API when a video track event occurs.

// Respond to the video stream event
peerconnection.ontrack((event) => {
    // Create encoded stream and transformer
    const encodedStream = event.receiver.createEncodedStreams();
    const transformer = new TransformStream({
        async transform(frame, controller) {
            // Convert data from ArrayBuffer to Uint8Array
            const frame_data = new Uint8Array(frame.data);
            const total_length = frame_data.length;

            // Shift to left for endianess to retrieve the metadata size from the last
            // 4 bytes of the buffer
            let shft = 3;
            const size = frame_data.slice(total_length - 4)
                .reduce((acc, v) => acc + (v << shft--), 0);
 
            // Use the byte signal identifying that the remaining data is frame metadata and
            // confirm that the signal is in the frame.
            const magic_value = [ 0xCA, 0xFE, 0xBA, 0xBE ];
            const magic_bytes = frame_data.slice(total_length - size - 2*4, total_length - size - 4);
            let has_magic_value = magic_value.every((v, index) => v === magic_bytes[index]);
 
            // When there is metadata in the frame, get the metadata array and handle it
            // as needed by your application.
            if(has_magic_value) {
                const data = frame_data.slice(total_length - size - 4, total_length - 4);
                console.log("received data : ", String.fromCharCode(...data));
            }

          // Send the frame as is which is supported by video players
            controller.enqueue(frame);
        },
    });
 
    encodedStream.readable.pipeThrough(transformer)
    .pipeTo(encodedStream.writable);
});

How-to Publish Metadata with the Native SDK

Using the Native SDK you can embed metadata with the frame. The way this metadata is embedded allows playback video players to be backward compatible even if they are unable to read and display the metadata.

📘

Example Project

You can find a more complete C implementation example in the millicast/metadata-publisher-demo project.

If you have specific requirements for the version of libwebrtc in use for your platform contact us for additional implementation details.

1. Enable Frame Transformer

When the transformer is activated, it will enable inspection of frames for additional metadata appended.

// Enable
enable_frame_transformer(true);

2. Listen for Transformable Frame Callback

When the frame is being transformed, a callback is fired allowing additional data to be stored. This example demonstrates storing an x,y position as metadata that might reflect the location of an object in the frame. Note that there is a 4-byte sequence that helps identify the remaining data encoded in the frame is metadata.

// Set the length of the user data in the last 4 bytes as a signal to unpacking
// the data during playback
void encode(int32_t value, std::vector<uint8_t>& data)
    {
        data.push_back((value >> 24) & 0xff);
        data.push_back((value >> 16) & 0xff);
        data.push_back((value >> 8) & 0xff);
        data.push_back(value & 0xff);
    }

// Add metadata at the end of the frame data
void on_transformable_frame([[maybe_unused]] uint32_t ssrc, [[maybe_unused]] uint32_t timestamp, std::vector<uint8_t>& data) override
    {
        constexpr uint8_t SPEED = 10;

        if (pos_x == width || pos_x == 0)
        {
            dir_x *= -1;
        }

        if (pos_y == height || pos_y == 0)
        {
            dir_y *= -1;
        }

        pos_x += dir_x * SPEED;
        pos_y += dir_y * SPEED;

        pos_x = std::clamp(pos_x, 0, width);
        pos_y = std::clamp(pos_y, 0, height);

        encode(pos_x, data);
        encode(pos_y, data);
    }

Learn More

You can find some additional examples of exchanging data during a broadcast and other messaging examples from the developer blog.