Interoperability with Video Conferencing



In addition to supporting audio-only participants, Cisco TelePresence also supports video conferencing participants. This is done by bridging together the TelePresence multipoint meeting (hosted on a Cisco TelePresence Multipoint Switch [CTMS] and a regular multipoint video conference [hosted on a Cisco Unified Video conferencing MCU]). Bridging multipoint meetings together has been around for years in the video conferencing industry and is referred to as cascading. The Cisco implementation of cascading betweenTelePresence and video conferencing is similar to previous implementations in the market, except that Cisco had to create a way of mapping multiscreen TelePresence systems with standard single-screen video conferencing systems. Cisco calls this Active Segment Cascading.
Note 
A multipoint cascaded conference is the only method for interoperating between Cisco TelePresence and traditional video conferencing endpoints. Direct, point-to-point calls between a TelePresence system and a video conferencing endpoint are not allowed. This is highly likely to change in the future as Cisco continues to develop additional interoperability capabilities within the TelePresence solution.

Interoperability RTP Channels

When Cisco engineered its video conferencing interoperability solution, the vast majority of video conferencing equipment ran at CIF or 4 CIF resolutions (720p was just beginning to become widely deployed), and no video conferencing endpoints or MCUs (including the CUVC platform) were capable of receiving and decoding the Cisco 1080p resolution video and AAC-LD audio. Therefore, Cisco had two choices:
  • Degrade the experience for the TelePresence participants by encoding the entire meeting at a much lower resolution and using inferior audio algorithms to accommodate the video conferencing participants
  • Maintain the 1080p/AAC-LD experience for the TelePresence participants and send an additional video and audio stream for the video conferencing MCU to digest
For obvious reasons, Cisco chose the latter method.
Note 
The methods described herein are highly likely to change in the future as video conferencing equipment becoming increasingly capable of higher definition video resolutions (720p and 1080p) and AAC-LD audio becomes more commonplace within the installed base.
When a Cisco TelePresence system (single-screen or multiscreen model) dials into a CTMS meeting that is configured for interoperability, the CTMS requests the TelePresence endpoint to send a copy of its 1080p video in CIF resolution and a copy of its AAC-LD audio in G.711 format. These CIF and G.711 streams are then switched to the CUVC MCU, which, in turn, relays them to the video conferencing participants. In the reverse direction, the CUVC sends the CTMS its CIF resolution video and G.711 audio from the video conferencing participants, and the CTMS relays that down to the TelePresence participants.

CIF Resolution Video Channel

Multiscreen TelePresence systems, such as the CTS-3000 and CTS-3200, provide three channels of 1080p / 30 resolution video. However, only one can be sent to the CUVC at any given time, so the Cisco TelePresence codec uses a voice-activated switching methodology to choose which of the three streams it should send at any moment in time. If a user on the left screen starts talking, the left codec encodes that cameras video using H.264 at 1080p / 30 (or whatever resolution/motion-handling setting the system is set to use) and also at CIF / 30. If a user in the center starts talking, the left codec stops encoding the CIF video channel, and the center codec now begins encoding the center cameras video using H.264 at 1080p / 30 (or whatever resolution/motion-handling setting the system is set to use) and CIF / 30. This switching occurs dynamically throughout the life of the meeting between the left, center, and right codecs based on the microphone sensitivity (who is speaking the loudest) of each position.
Single-screen TelePresence systems such as the CTS-1000 and CTS-500 have only one screen (the center channel), so no switching is required.
When encoded, the CIF channel is multiplexed by the primary codec into the outgoing RTP video stream along with the other four video channels (left, center, right, and auxiliary). In the case of a single-screen system, the CIF channel is multiplexed in with the other two video channels (center and auxiliary).

G.711 Audio Channel

On multiscreen TelePresence systems, such as the CTS-3000 and CTS-3200, there are three channels of AAC-LD audio. Instead of sending one at a time, the primary (center) codec mixes all three channels together and encodes the mix in G.711 format. Therefore, all parties can be heard at any given time.
The G.711 channel is multiplexed into the outgoing RTP audio stream with the other four channels (left, center, and right AAC-LD audio channels, and the auxiliary audio channel).
Single-screen systems have only a single microphone channel (center), so there is no need to mix. The center channel is encoded in both AAC-LD and G.711 formats and multiplexed together into the outgoing RTP audio stream along with the auxiliary audio channel, for a total of three audio channels.

Additional Bandwidth Required

As a result of having to send these additional CIF resolution video and G.711 audio channels, additional bandwidth is consumed by each participating TelePresence System. The CIF resolution video is encoded at 704 kbps, and the G.711 audio is encoded at 64 kbps, for a total of 768 kbps additional bandwidth.

CTMS Switching of the Interop Channels

As previously discussed, the multiple channels of video and the multiple channels of audio are multiplexed using the SSRC field in the RTP header. Ordinarily, there are four video positions and four audio positions within the SSRC field (left, center, right, and auxiliary). A fifth SSRC position was defined to carry the CIF and G.711 interop channels within the video and audio RTP streams.
When the CTMS receives the video and audio streams from any TelePresence system, it reads the SSRC position of the RTP header and decides where to switch it. Left, center, right, and auxiliary positions are switched to the other TelePresence participants, and the interop position is switched to the CUVC MCU.
In the opposite direction, the CIF video and G.711 audio coming from the CUVC MCU to the CTMS is appended with the SSRC position of the interop channel and sent down to all the participating TelePresence rooms.

Decoding of the Interop Channels

When a Cisco TelePresence system receives RTP packets containing the SSRC value of the interop position, the primary (center) codec forwards the CIF video RTP packets to the left secondary codec to be decoded. The center codec decodes the G.711 audio, mixes it with the left channel of decoded AAC-LD audio, and plays the mix out the left speaker. This way, the video conferencing participants always appear on the left display and are heard coming out the left speaker, along with any TelePresence participants seated on that side of the system. On single-screen systems, it obviously appears on the single (center) display and speaker.
Because CIF video is 4:3 aspect ratio (352x288 resolution), and the TelePresence displays are 16:9 aspect ratio and run at 1080p / 60 resolution, the CIF video must be displayed in the best possible way. Stretching it to fit a 65-inch 1080p display would look terrible, so the left codec pixel doubles the decoded video to 4CIF resolution (704x576) and displays it on the 1080p display surrounded by black border
s

No comments:

Post a Comment