BLOG: Video Interoperability in Skype for Business
View the orginal blog and comments on Jeff Schertz's Blog posted January 26, 2015. Video Interoperability in Skype for Business With the recently launched Office 365 Summit events Microsoft has started sharing technical details on the various new capabilities which are on the horizon with the future release of Skype for Business Server. In a previous article this rebranding of Lync to Skype for Business (SfB) was analyzed and explained in an effort to clarify some of the confusion immediately seen after that announcement. This article will attempt to do the same regarding one of the advertised capabilities coming to the Lync replacement in Skype for Business: Video Interoperability. As made evident by the unexpected popularity of an earlier article on this same topic for Lync 2013, there is a growing need to understand this space which has actually become more complicated over time, due to the increasing number of applicable solutions and methods coming into the market since then, provided by multiple Microsoft partners and even Microsoft themselves.
BackgroundBefore getting into the new information it would be prudent to start with some baseline understanding of what the generalized term of ‘video interoperability’ actually means. Depending on the source this could be referring to traditional standards-based conference room solutions communicating with foreign systems, or this could be more of a story about tying together with enterprise and consumer grade applications. Or both. Whatever the discussion, it always boils down to figuring out a way to get something old to work with something new or making things foreign to each other find a way to interact successfully. One additional approach is simply forego interoperability by replacing any incompatible system with new, supported solutions. This alternative approach is what the Lync Room System product line is intended to address. Either by reducing the need for interoperability by shifting new purchases toward these native systems, or by figuratively ‘biting the bullet’ and just replacing everything with an LRS solution. Clearly cost, scale, and time are controlling factors in the ability to even attempt this approach which can look much simpler on paper. Also this only addresses a company’s own systems and limits their ability to host conferences with partners and customers who may be using different solutions. Hence the very real and common need for finding a way to protect and leverage any investment in existing systems, while possibly even shifting future expenditures toward a completely Skype-centric view.
Apples to SpaceshipsThere have always been a fair share of challenges in providing a bridge between the Microsoft UC platform world and the massive in-place deployment of the world’s standards-based conferencing solutions. Much of that complexity has to do with the wide array of communication paths (e.g. signaling, audio, video, content) and the large gap in design methodology between each. The popular fruit-based idiom just does not ring true in this scenario even if though both sides are trying to share the same sources of data: a person’s face, their voice, a spreadsheet, or presentation deck. It is the delivery mechanisms which can be quite different in design and application to where neither can be equated with being from the same food group, much less even both be considered foods. The shear growth in adoption of Microsoft’s Commutations Server platform over time has driven multiple partners to provide a varying array of solutions from value-add devices to complete endpoints to core infrastructure. Also understand that any references to H.264 Scalable Video Coding ( SVC) in this article infers Microsoft’s specific implementation of the codec, advertised as X-H264UC, which is not directly compatible with H.264 SVC that some standards-based video systems support today.
Lync to SkypeSimply rebranding the enterprise solution is not enough to make it and the existing consumer platform play nice together. Microsoft has already been working on addressing how to bring together existing Skype consumer clients with the Lync enterprise deployment base. Renaming the enterprise platform the same as the consumer platform may look like the first step down that path, but in reality much work has already gone on in the background starting over one year ago, as covered in this article. In place today is the version 2 Skype Gateway architecture which provides for direct media traversal between Skype consumer Windows desktop clients and Lync 2013 clients. This same solution will be applicable to Skype for Business clients when that product is released. Basically the Lync 2013 clients have received SILK audio support via a past Cumulative update, and the latest Skype consumer client for Windows include support for both the H.264 SVC video codec and media relay utilizing the Lync Edge Server. This Business to Consumer (B2C) concept has been discussed in various past articles amongst the community, so for now the focus of this article will be on the various enterprise-grade options for Business to Business (B2B) needs.
Enterprise Video InteroperabilityAs captured in this category of articles there are already a variety of third-party solutions available to address this which have been around since the early days of the Communications Server platform. These options range from basic signaling gateways or more powerful transcoding gateways with limited scalability all the way through full suites of conferencing bridges and signaling servers which can either host the entire conference itself or join an existing Lync conference. The four available methodologies for addressing these needs can be summarized as:
- Native Endpoint Registration
- Multipoint Control Unit
- Bridge Cascading
Native Endpoint RegistrationAs the name suggests, this means that no back-end interoperability solution is used. The Lync or Skype for Business environment is used as the sole conferencing engine and all endpoints (software clients and hardware devices) will connect directly and natively to these environments wherever they may reside. Note that in this diagram the ‘Desktop Client’ could be any variety of past or future Microsoft UC clients: Office Communicator 2007, Lync 2010, Lync 2013, or the upcoming Skype for Business client. These clients all support a range of compatible audio and video codecs, with varying support for both RealTime Video (RTV) and H.264 SVC. For example, while the Lync 2013 and SfB clients will support both RTV and SVC, the Lync 2010 and older clients only support RTV. When hosting conferencing on a Lync 2013 or Skype for Business platform which either may contain older desktop clients or (more realistically) be inviting federated or foreign attendees who may still be running older Lync or Communicator client versions then it is important for the room system solutions in these environments to also support the older RTV codec so that all participants in the meeting can be seen and heard by all attendees regardless of their versions. A few different options are available today to either provide a plug-and-play experience to users or deploy a dedicated conferencing room system that can talk directly to the Skype for Business or Lync platform. The Microsoft Lync Catalog currently lists all of the qualified Meeting Room Device and Solutions. Some of these systems support native registration to on-premises services only, while other may be also able to connect directly to Microsoft’s Office 365 offering, or even other hosting provider’s clouds.
- Desktop Clients: Provide one of the various qualified Lync video conferencing devices which connect to a Windows desktop to provide an enhanced in-room audio and video experience without the need for a dedicated endpoint. Users will bring their own workstation, connect via USB to one of these systems, and then drive the meeting from their own Lync client using their own identity. This is the least expensive option and only requires deployment of something like the Logitech CC3000e or Polycom CX5500.
- Lync Room Systems: To eliminate the need to bring any workstations into the conference room as well as improve the audio and video experience then a completely native and permanent solution is deployed into the conferencing like a Lync Room System (LRS) package available from Crestron, Polycom, or Smart. These systems are back-ended by a hardened Windows-embedded PC which communications directly with an on-premises or hosted Lync or Skype for Business environment. Also new in this space is the recently announced Microsoft Surface Hub platform which can serve as a low-end LRS-like package to easily bring a conferencing experience to any wall with basic audio and video capabilities served by an integrated microphone and camera. (Note that the Surface Hub does not run the LRS client and is a completely new design based solely on Windows 10 and the Surface touch experience.)
- Qualified Room Systems: To move even further beyond the current Lync or Skype for Business specific solutions a modern standards-based room system can be deployed which support native Lync and Skype for Business communications protocols and codecs. Partners in this space have included in their standards-based systems additional support for varying levels of the multiple protocols and codecs like Microsoft’s implementation of SIP and H.264 SVC, RTV, or the Centralized Conference Control Protocol (CCCP) to name just a few. Examples of these room solutions are the LifeSize 220 or Polycom HDX & Group Series.
GatewaysThe first, and most basic method to address the issue would be to use gateways to provide an access route for various unsupported room systems to reach the Lync/SfB world. Conferences and peer call control is still owned by the Lync/SfB environment, but a transcoding and/or signaling gateway can offer a path for a limited number of systems to communicate with the Lync clients and servers, often with only a subset of the available modalities and features. In short these solutions may either only support audio and video with no content sharing capabilities across all platforms, or may be limited to internally connected systems with no Edge media traversal compatibility. In this diagram the foreign VTC is registered via H.323 or SIP either directly to a video gateway or is registered to their own native environment which includes a gateway configured to route traffic between to and the Lync environment. The gateway will translate the different signaling protocols, for example between H.323 and Microsoft SIP. Some gateways are even capable of further transcoding the audio and video codecs, like Microsoft’s X-H264UC implementation of H.264 SVC against H.264 AVC. The diagram shows a simple environment of one VTC behind a single gateway, but imagine that the environment within the dotted grey box could be as vast as multiple endpoints connected to a complete video infrastructure behind pools of multiple gateways which are then connected to the Microsoft environment. Examples of endpoints which fit into the VTC category would be any array of Cisco’s older Tandberg H.323 or SIP endpoints or their TelePresence solutions, some LifeSize systems, older Polycom VSX endpoints, and even ISDN video systems just to name a few. Examples of the video gateways would be the Cisco VCS or Radvision Scopia. Note that this category has been the least active over the past few years as solutions have matured into one of the next methodologies.. Cisco’s VCS solution has received some updates for Lync 2013 video interoperability in the past year but this solution has never been included among the Lync qualified solutions. While vendor support is available from Cisco this is not a solution seen actually deployed that often in the field. Also the Radvision Scopia gateway was last qualified for Lync 2010 and has not seen any updates to support H.264 SVC as implemented in Lync 2013.
The topic of gateways will be revisited in the second-half of this article as Microsoft will will utilizing this methodology with Skype for Business server.
Multipoint Control Unit (MCU)The simplicity of the first scenario is also its most limiting factor. As mentioned before, what about the cost of simply replacing the large of amount of functional systems out there in use today? Or deploying and managing a large number of gateways, thus further complicating the environment and communication paths? One alternative here is to utilize a standards-based conferencing solution which can deal with the plethora of non-Lync standards in existence today, and then provide a path for the Lync users to also reach this same conference. Lync and SfB users simply call into these meetings which are hosted on the standards-based MCU, also referred to as bridges, providing a single meeting place that can bring everyone together. These separate bridges are the virtual location where everyone calls into to hear and see each other. Conferences in this scenario are hosted on the standards-based side of the fence so all clients must negotiate their media sessions directly, or indirectly with the assistance of an Edge Server if supported by the third party solution. The call signaling path is still native for endpoints on both sides, but SIP messages are routed out of the Front End Server to the integrated standards-based system. This means that conferences held in this manner, although technically able to handle audio, video and possibly some content sharing, are not utilizing any of the Front End server’s conferencing capabilities. A varying degree of native Lync and Skype for Business capabilities may not be available to those users, depending on which third party vendor’s solution is deployed. Because each and every Lync client must directly connect to the third party bridge then vendors must test and support every type of Microsoft client available in the Lync and Skype for Business platforms. Most vendors only support a subset of these clients across different versions, and even then only some codecs and modalities among those. This means that conferences may not be able to provide the same level of results to all types, with the mobile and Mac clients traditionally lagging behind in support. Examples of some third party vendors which support this model today are Acano, BlueJeans, Fuze, Pexip, and Polycom. Note that currently the only Lync Qualified solution among these is the Polycom RealPresence Platform, comprised of the RMX and DMA components.
Bridge CascadingEvery one of the scenarios above are really just a combination of compromises in the end as while each may contain some measurable advantages over the other the overall architectures is not ideal. The best single solution is to not have a single solution, but to use both environments as originally intended and then just connect them to each other. This approach leverages the strengths of both platforms and retains the native user experiences on both sides. In this topology the standards-based MCU is connected directly to the Lync AVMCU during any meetings allowing endpoints on either side of the table to join the same, cascaded conference with all participants able to see and hear any active speakers, and in some cases even multiple video steams in one direction or the other. Examples of third party solutions which support this model today are Acano CoSpace, Pexip Infinity, and Polycom RealConnect. While each of these solutions leverage both MCUs in a single meeting there are varying amounts of capabilities related to the mechanics behind them, the manner in which participants join meetings, the amount of video streams, and the list of supported codecs. One of the single biggest advantages of this model is that it leaves all of the Lync clients completely on their side of the map, unlike the previous approach which forces them to connect directly to the third party MCU. While the initial gateway approach utilizes the Lync MCU for all conferencing attendees that environment is limited to what those gateways can bring in, which often is not very much in terms of types and amounts of VTCs. Other major advantages of this architecture is that the entire conference is native to both side. For example, capabilities unique to RealConnect are that scheduling meetings is done within Outlook using the standard Lync Meeting invitations. Joining meetings is the same for all, clicking an embedded link (for desktop users) or dialing a Conference Id (for audio attendees and room video systems). Secondly bidirectional, transcoded content sharing is made available to all parties on either side when either a Lync or SfB participants is sharing their desktop or if a VTC is sending some sort of H.239 or BFCP content stream.
Video Interoperability ServerThe various options covered above are great for supplying a full conferencing environment which addresses a multitude of real-world requirements and issues. But what about the smaller environments where maybe only a handful of legacy room systems are deployed but cannot simply be replaced with new systems, nor is deploying additional infrastructure (physical or virtual) in the cards. If additional costs or management worries have traditionally meant that the third-party back-end solutions have just been not viable options, then in traditional Microsoft fashion a basic solution is now about to be embedded natively into the product. Just as Microsoft has incorporated capabilities into the Communications Server platform along the way, like an XMPP Gateway for example, the upcoming releases of the Communications Server platform Microsoft has positioned Skype for Business Server to address both consumer client B2C scenarios and standards-based interoperability for B2B video-based communications. B2C video support for Skype consumer clients has already been delivered by incorporating changes into the Lync 2013 client and server platform late last year to allow for peer-to-peer video calls between Lync 2013 users and Skype consumer Windows desktop users. The B2B scenario is also being addressed natively, for the first time within the product itself, by leveraging a new server role available with on-premises deployments of the upcoming Skype for Business Server platform. This software release will contain a new server role available to define the topology and deploy called the Video Interoperability Server (VIS). Fellow Lync MVP Adam Jacobs posted an article introducing VIS nearly a year ago, just after the 2014 Lync Conference was held in Las Vegas. That article discusses this gateway concept of a Back-to-Back User Agent (B2BUA) with what was publically known about VIS at the time. He has also just posted a follow-up article touching on both the Skype consumer capability as well as VIS. With the recent release of the latest content from the Summit events there are now more public details on VIS in terms of the supported topology and endpoints. The first takeaway from reviewing the information is that the capabilities are a smaller subset of what was originally advertised.
TopologyVIS is available only as a separate server role, and will not be offered as a collocated Front End server role, unlike the Mediation Server role. This means that additional physical or virtual Skype for Business servers will need to be deployed into one or more scaled VIS pools. Also note that Microsoft has stated that the role is only available to on-premises and Hybrid deployments, meaning an on-premises pool will need to be deployed and is not available as a feature for Office 365-only customers. The initial offering of VIS will support a single Operating Mode entitled SIP Trunk Mode, which could be equated to what the Mediation Server role does for audio calling between Lync and IP-PBX platforms by virtue of establishing SIP trunks between them, but now for both audio and video. Basically this new server role acts as a gateway between the Skype for Business servers/clients and some sort of foreign video signaling server. VIS supports a 1:N topology in that a single VIS pool can be configured to communicate with multiple different video signaling gateways. Meanwhile any one video signaling server can only be connected to a single VIS pool. The only supported environment at product launch will require that VIS be connected to a Cisco Unified Communications Manager (CUCM or CallManager) deployment which in turn includes one or more of a specific list of tested and supported Cisco VTC models. Note that there is no support here for the Cisco Video Communications Server (VCS) which is more commonly found in currently deployed video environment. Cisco appears to be moving away from the legacy VCS platform by supporting video signaling in CUCM and Microsoft has chose to go the same route with VIS support. The supported VTC endpoints listed at the time this article was written are as follows:
- Cisco TelePresence Codecs (C40, C60, C90)
- Cisco TelePresence MX Series (MX200, MX300)
- Cisco TelePresence EX Series (EX60, EX90)
- Cisco TelePresence SX Series (SX20)
Multiparty ArchitectureVIS will provide connectivity for supported VTCs to both clients and servers. The previous diagram shows the signaling and media flow for a conference hosted on the SfB Front End server by the collocated AVMCU service. VIS is used to proxy the connection and media for VTCs so they can participate directly in the meeting. In the SIP Trunk mode each VTC remains registered to the CUCM infrastructure and then can place calls through CUCM, to the VIS pool, and then on to the Skype for Business Front End pool’s Conference Auto Attendant. There is no drag-and-drop support so SfB users cannot locate a specific VTC and simply drag it into a peer or conference call in an attempt to invite the VTC to the meeting. The VTC must call into the meeting manually by the conference room attendees. Once in the meeting only a single active video participant can be sent to/from the VTC via VIS, and there is no support for content sharing thus far. This means that the experience from inside the conference room will look a lot like the following image. The Skype for Business and Lync users will receive multiple video participants via the Gallery View in addition to content shared by another desktop client, the same as they would in any normal meeting. Yet when the VTC joins the meeting the attendee will only see the active speaker and will also not receive any of the shared content. Compare the room system and desktop user experience above, as provided by VIS with what a third-party solution like bridge cascading can provide because they can support multiple streams and content. For example the capabilities of Polycom RealConnect are depicted below which includes bi-directional content sharing and multiple active speaker video participants from Lync appearing on the VTC.
Simulcast TranscodingMicrosoft’s implementation of H.264 SVC provides multiple simulcast video streams in multiparty conference calls. While Lync 2013 and SfB clients are programmed to send (when requested) these additional streams directly to the Front End server, the legacy VTCs do not have this capability. (Note that native endpoints like LRS and the Polycom Group Series do support these simulcast streams). In order to retain the flexibility to fulfill different video resolution and frame rate requests across various clients the Front End Server AVMCU needs this to be addressed by VIS. The way this works is that VIS acts as a media transcoding gateway, not just a basic signaling gateway. The VTC will negotiate an outbound video stream directly to VIS at a specific resolution and frame rate . If the Front End Server AVMCU has any client requests for differing, lower resolutions or frame rates it will then request one or two additional streams. Because the VTC can not provide these additional streams then VIS must create them. VIS itself will transcode and send to the AVMCU up to a maximum of three different video streams, all derived from the single, original stream send by the VTC. The example above shows a VTC joining a meeting with 3 other Skype for Business endpoints of varying hardware capabilities and conference views. The VTC in this case happens to negotiate and encode a 720p video stream at 30 frames per second to VIS.
- VIS repackages the original H.264 AVC stream into an SVC session understood by the AVMCU which in turn relays it to the laptop participant who happens to have ‘Speaker View’ enabled and thus is requesting full screen high definition video at the full 30 fps.
- VIS will transcode a second stream, downscaling the resolution to 360p as requested by the desktop client which has the default ‘Gallery View’ enabled.
- VIS will also transcode as third stream, downscaling the provided video even further to supply a 180p stream at only 15fps for the mobile device in the conference.