Video Interoperability in Skype for Business
With the recently launched Office 365 Summit events Microsoft has started sharing technical details on the various new capabilities which are on the horizon with the future release of Skype for Business Server. In a previous article this rebranding of Lync to Skype for Business (SfB) was analyzed and explained in an effort to clarify some of the confusion immediately seen after that announcement.
This article will attempt to do the same regarding one of the advertised capabilities coming to the Lync replacement in Skype for Business: Video Interoperability. As made evident by the unexpected popularity of an earlier article on this same topic for Lync 2013, there is a growing need to understand this space which has actually become more complicated over time, due to the increasing number of applicable solutions and methods coming into the market since then, provided by multiple Microsoft partners and even Microsoft themselves.
Before getting into the new information it would be prudent to start with some baseline understanding of what the generalized term of ‘video interoperability’ actually means. Depending on the source this could be referring to traditional standards-based conference room solutions communicating with foreign systems, or this could be more of a story about tying together with enterprise and consumer grade applications. Or both. Whatever the discussion, it always boils down to figuring out a way to get something old to work with something new or making things foreign to each other find a way to interact successfully.
One additional approach is simply forego interoperability by replacing any incompatible system with new, supported solutions. This alternative approach is what the Lync Room System product line is intended to address. Either by reducing the need for interoperability by shifting new purchases toward these native systems, or by figuratively ‘biting the bullet’ and just replacing everything with an LRS solution. Clearly cost, scale, and time are controlling factors in the ability to even attempt this approach which can look much simpler on paper. Also this only addresses a company’s own systems and limits their ability to host conferences with partners and customers who may be using different solutions.
Hence the very real and common need for finding a way to protect and leverage any investment in existing systems, while possibly even shifting future expenditures toward a completely Skype-centric view.
Apples to Spaceships
There have always been a fair share of challenges in providing a bridge between the Microsoft UC platform world and the massive in-place deployment of the world’s standards-based conferencing solutions. Much of that complexity has to do with the wide array of communication paths (e.g. signaling, audio, video, content) and the large gap in design methodology between each. The popular fruit-based idiom just does not ring true in this scenario even if though both sides are trying to share the same sources of data: a person’s face, their voice, a spreadsheet, or presentation deck. It is the delivery mechanisms which can be quite different in design and application to where neither can be equated with being from the same food group, much less even both be considered foods.
The shear growth in adoption of Microsoft’s Communications Server platform over time has driven multiple partners to provide a varying array of solutions from value-add devices to complete endpoints to core infrastructure.
Also understand that any references to H.264 Scalable Video Coding (SVC) in this article infers Microsoft’s specific implementation of the codec, advertised as X-H264UC, which is not directly compatible with H.264 SVC that some standards-based video systems support today.
Lync to Skype
Simply rebranding the enterprise solution is not enough to make it and the existing consumer platform play nice together. Microsoft has already been working on addressing how to bring together existing Skype consumer clients with the Lync enterprise deployment base. Renaming the enterprise platform the same as the consumer platform may look like the first step down that path, but in reality much work has already gone on in the background starting over one year ago, as covered in this article.
In place today is the version 2 Skype Gateway architecture which provides for direct media traversal between Skype consumer Windows desktop clients and Lync 2013 clients. This same solution will be applicable to Skype for Business clients when that product is released. Basically the Lync 2013 clients have received SILK audio support via a past Cumulative update, and the latest Skype consumer client for Windows include support for both the H.264 SVC video codec and media relay utilizing the Lync Edge Server.
This Business to Consumer (B2C) concept has been discussed in various past articles amongst the community, so for now the focus of this article will be on the various enterprise-grade options for Business to Business (B2B) needs.
Enterprise Video Interoperability
As captured in this category of articles there are already a variety of third-party solutions available to address this which have been around since the early days of the Communications Server platform. These options range from basic signaling gateways or more powerful transcoding gateways with limited scalability all the way through full suites of conferencing bridges and signaling servers which can either host the entire conference itself or join an existing Lync conference.
The four available methodologies for addressing these needs can be summarized as:
- Native Endpoint Registration
- Multipoint Control Unit
- Bridge Cascading
Native Endpoint Registration
As the name suggests, this means that no back-end interoperability solution is used. The Lync or Skype for Business environment is used as the sole conferencing engine and all endpoints (software clients and hardware devices) will connect directly and natively to these environments wherever they may reside.
Note that in this diagram the ‘Desktop Client’ could be any variety of past or future Microsoft UC clients: Office Communicator 2007, Lync 2010, Lync 2013, or the upcoming Skype for Business client. These clients all support a range of compatible audio and video codecs, with varying support for both RealTime Video (RTV) and H.264 SVC. For example, while the Lync 2013 and SfB clients will support both RTV and SVC, the Lync 2010 and older clients only support RTV. When hosting conferencing on a Lync 2013 or Skype for Business platform which either may contain older desktop clients or (more realistically) be inviting federated or foreign attendees who may still be running older Lync or Communicator client versions then it is important for the room system solutions in these environments to also support the older RTV codec so that all participants in the meeting can be seen and heard by all attendees regardless of their versions.
A few different options are available today to either provide a plug-and-play experience to users or deploy a dedicated conferencing room system that can talk directly to the Skype for Business or Lync platform. The Microsoft Lync Catalog currently lists all of the qualified Meeting Room Device and Solutions. Some of these systems support native registration to on-premises services only, while other may be also able to connect directly to Microsoft’s Office 365 offering, or even other hosting provider’s clouds.
- Desktop Clients: Provide one of the various qualified Lync video conferencing devices which connect to a Windows desktop to provide an enhanced in-room audio and video experience without the need for a dedicated endpoint. Users will bring their own workstation, connect via USB to one of these systems, and then drive the meeting from their own Lync client using their own identity. This is the least expensive option and only requires deployment of something like the Logitech CC3000e or Polycom CX5500.
- Lync Room Systems: To eliminate the need to bring any workstations into the conference room as well as improve the audio and video experience then a completely native and permanent solution is deployed into the conferencing like a Lync Room System (LRS) package available from Crestron, Polycom, or Smart. These systems are back-ended by a hardened Windows-embedded PC which communications directly with an on-premises or hosted Lync or Skype for Business environment. Also new in this space is the recently announced Microsoft Surface Hub platform which can serve as a low-end LRS-like package to easily bring a conferencing experience to any wall with basic audio and video capabilities served by an integrated microphone and camera. (Note that the Surface Hub does not run the LRS client and is a completely new design based solely on Windows 10 and the Surface touch experience.)
- Qualified Room Systems: To move even further beyond the current Lync or Skype for Business specific solutions a modern standards-based room system can be deployed which support native Lync and Skype for Business communications protocols and codecs. Partners in this space have included in their standards-based systems additional support for varying levels of the multiple protocols and codecs like Microsoft’s implementation of SIP and H.264 SVC, RTV, or the Centralized Conference Control Protocol (CCCP) to name just a few. Examples of these room solutions are the LifeSize 220 or Polycom HDX & Group Series.
Any other conferencing solutions outside of these categories are simply not invited to the party and would require some assistance from a additional solution to bridge the communications barrier.
Microsoft generically refers to these legacy standards-based conference room systems which do not contain any embedded Lync interoperability as a Video TeleConferencing system, or VTC. Throughout this article the term VTC will continue to refer to these types of systems, which must utilize one of the following solutions in order to have any chance of participating in meetings with any Lync or Skype for Business users.
The first, and most basic method to address the issue would be to use gateways to provide an access route for various unsupported room systems to reach the Lync/SfB world. Conferences and peer call control is still owned by the Lync/SfB environment, but a transcoding and/or signaling gateway can offer a path for a limited number of systems to communicate with the Lync clients and servers, often with only a subset of the available modalities and features. In short these solutions may either only support audio and video with no content sharing capabilities across all platforms, or may be limited to internally connected systems with no Edge media traversal compatibility.
In this diagram the foreign VTC is registered via H.323 or SIP either directly to a video gateway or is registered to their own native environment which includes a gateway configured to route traffic between to and the Lync environment. The gateway will translate the different signaling protocols, for example between H.323 and Microsoft SIP. Some gateways are even capable of further transcoding the audio and video codecs, like Microsoft’s X-H264UC implementation of H.264 SVC against H.264 AVC.
The diagram shows a simple environment of one VTC behind a single gateway, but imagine that the environment within the dotted grey box could be as vast as multiple endpoints connected to a complete video infrastructure behind pools of multiple gateways which are then connected to the Microsoft environment.
Examples of endpoints which fit into the VTC category would be any array of Cisco’s older Tandberg H.323 or SIP endpoints or their TelePresence solutions, some LifeSize systems, older Polycom VSX endpoints, and even ISDN video systems just to name a few.
Examples of the video gateways would be the Cisco VCS or Radvision Scopia. Note that this category has been the least active over the past few years as solutions have matured into one of the next methodologies.. Cisco’s VCS solution has received some updates for Lync 2013 video interoperability in the past year but this solution has never been included among the Lync qualified solutions. While vendor support is available from Cisco this is not a solution seen actually deployed that often in the field. Also the Radvision Scopia gateway was last qualified for Lync 2010 and has not seen any updates to support H.264 SVC as implemented in Lync 2013.
The topic of gateways will be revisited in the second-half of this article as Microsoft will will utilizing this methodology with Skype for Business server.
Multipoint Control Unit (MCU)
The simplicity of the first scenario is also its most limiting factor. As mentioned before, what about the cost of simply replacing the large of amount of functional systems out there in use today? Or deploying and managing a large number of gateways, thus further complicating the environment and communication paths? One alternative here is to utilize a standards-based conferencing solution which can deal with the plethora of non-Lync standards in existence today, and then provide a path for the Lync users to also reach this same conference. Lync and SfB users simply call into these meetings which are hosted on the standards-based MCU, also referred to as bridges, providing a single meeting place that can bring everyone together. These separate bridges are the virtual location where everyone calls into to hear and see each other.
Conferences in this scenario are hosted on the standards-based side of the fence so all clients must negotiate their media sessions directly, or indirectly with the assistance of an Edge Server if supported by the third party solution. The call signaling path is still native for endpoints on both sides, but SIP messages are routed out of the Front End Server to the integrated standards-based system. This means that conferences held in this manner, although technically able to handle audio, video and possibly some content sharing, are not utilizing any of the Front End server’s conferencing capabilities. A varying degree of native Lync and Skype for Business capabilities may not be available to those users, depending on which third party vendor’s solution is deployed.
Because each and every Lync client must directly connect to the third party bridge then vendors must test and support every type of Microsoft client available in the Lync and Skype for Business platforms. Most vendors only support a subset of these clients across different versions, and even then only some codecs and modalities among those. This means that conferences may not be able to provide the same level of results to all types, with the mobile and Mac clients traditionally lagging behind in support.
Examples of some third party vendors which support this model today are Acano, BlueJeans, Fuze, Pexip, and Polycom. Note that currently the only Lync Qualified solution among these is the Polycom RealPresence Platform, comprised of the RMX and DMA components.
Every one of the scenarios above are really just a combination of compromises in the end as while each may contain some measurable advantages over the other the overall architectures is not ideal. The best single solution is to not have a single solution, but to use both environments as originally intended and then just connect them to each other. This approach leverages the strengths of both platforms and retains the native user experiences on both sides.
In this topology the standards-based MCU is connected directly to the Lync AVMCU during any meetings allowing endpoints on either side of the table to join the same, cascaded conference with all participants able to see and hear any active speakers, and in some cases even multiple video steams in one direction or the other.
Examples of third party solutions which support this model today are Acano CoSpace (now called Dual Home), Pexip Infinity, and Polycom RealConnect. While each of these solutions leverage both MCUs in a single meeting there are varying amounts of capabilities related to the mechanics behind them, the manner in which participants join meetings, the amount of video streams, and the list of supported codecs.
One of the single biggest advantages of this model is that it leaves all of the Lync clients completely on their side of the map, unlike the previous approach which forces them to connect directly to the third party MCU. While the initial gateway approach utilizes the Lync MCU for all conferencing attendees that environment is limited to what those gateways can bring in, which often is not very much in terms of types and amounts of VTCs.
Other major advantages of this architecture is that the entire conference is native to both side. For example, capabilities unique to RealConnect are that scheduling meetings is done within Outlook using the standard Lync Meeting invitations. Joining meetings is the same for all, clicking an embedded link (for desktop users) or dialing a Conference Id (for audio attendees and room video systems). Secondly bidirectional, transcoded content sharing is made available to all parties on either side when either a Lync or SfB participants is sharing their desktop or if a VTC is sending some sort of H.239 or BFCP content stream.
Video Interoperability Server
The various options covered above are great for supplying a full conferencing environment which addresses a multitude of real-world requirements and issues. But what about the smaller environments where maybe only a handful of legacy room systems are deployed but cannot simply be replaced with new systems, nor is deploying additional infrastructure (physical or virtual) in the cards. If additional costs or management worries have traditionally meant that the third-party back-end solutions have just been not viable options, then in traditional Microsoft fashion a basic solution is now about to be embedded natively into the product.
Just as Microsoft has incorporated capabilities into the Communications Server platform along the way, like an XMPP Gateway for example, the upcoming releases of the Communications Server platform Microsoft has positioned Skype for Business Server to address both consumer client B2C scenarios and standards-based interoperability for B2B video-based communications.
B2C video support for Skype consumer clients has already been delivered by incorporating changes into the Lync 2013 client and server platform late last year to allow for peer-to-peer video calls between Lync 2013 users and Skype consumer Windows desktop users.
The B2B scenario is also being addressed natively, for the first time within the product itself, by leveraging a new server role available with on-premises deployments of the upcoming Skype for Business Server platform. This software release will contain a new server role available to define the topology and deploy called the Video Interoperability Server (VIS).
Fellow Lync MVP Adam Jacobs posted an article introducing VIS nearly a year ago, just after the 2014 Lync Conference was held in Las Vegas. That article discusses this gateway concept of a Back-to-Back User Agent (B2BUA) with what was publically known about VIS at the time. He has also just posted a follow-up article touching on both the Skype consumer capability as well as VIS. With the recent release of the latest content from the Summit events there are now more public details on VIS in terms of the supported topology and endpoints. The first takeaway from reviewing the information is that the capabilities are a smaller subset of what was originally advertised.
VIS is available only as a separate server role, and will not be offered as a collocated Front End server role, unlike the Mediation Server role. This means that additional physical or virtual Skype for Business servers will need to be deployed into one or more scaled VIS pools. Also note that Microsoft has stated that the role is only available to on-premises and Hybrid deployments, meaning an on-premises pool will need to be deployed and is not available as a feature for Office 365-only customers.
The initial offering of VIS will support a single Operating Mode entitled SIP Trunk Mode, which could be equated to what the Mediation Server role does for audio calling between Lync and IP-PBX platforms by virtue of establishing SIP trunks between them, but now for both audio and video. Basically this new server role acts as a gateway between the Skype for Business servers/clients and some sort of foreign video signaling server.
VIS supports a 1:N topology in that a single VIS pool can be configured to communicate with multiple different video signaling gateways. Meanwhile any one video signaling server can only be connected to a single VIS pool.
The only supported environment at product launch will require that VIS be connected to a Cisco Unified Communications Manager (CUCM or CallManager) deployment which in turn includes one or more of a specific list of tested and supported Cisco VTC models. Note that there is no support here for the Cisco Video Communications Server (VCS) which is more commonly found in currently deployed video environment. Cisco appears to be moving away from the legacy VCS platform by supporting video signaling in CUCM and Microsoft has chose to go the same route with VIS support.
The supported VTC endpoints listed at the time this article was written are as follows:
- Cisco TelePresence Codecs (C40, C60, C90)
- Cisco TelePresence MX Series (MX200, MX300)
- Cisco TelePresence EX Series (EX60, EX90)
- Cisco TelePresence SX Series (SX20)
Microsoft has stated that additional models which can support Cisco TelePresence System codec software version 7 or newer (TC7.0.0) may be tested after the initial Skype for Business Server release and then added to the UC Open Interoperability Program for VTCs.
The most obvious thing about VIS at this point should be that it appears to be a Microsoft-provided Cisco gateway. There is no mention of other third party VTC manufacturer involvement in this program to date. There are a variety of reasons for that, one being that some partners which were focused on video compatibility with Office Communications Server and Lync Server in the past have fallen off the radar. For example Radvision’s gateway solution for Lync has not shown any activity in the qualification space since their purchase by Avaya. LifeSize has also not stayed up to date in the qualification program, as well as bowing out of the Lync Room System program last year. Most of the newer names in this space, like Acano or Pexip, are providing gateway and bridge solutions and do not provide any compatible endpoints.
Also clearly missing from the list above are any Polycom room systems. As covered in this variety of articles or in this blog post from another Lync MVP Brennon Kwok it should be clear that the last two generations of Polycom room systems support native Lync registration including a wide variety of features, much beyond what VIS can offer. So it would be a step backwards to attempt to utilize VIS as a gateway for the HDX and Group Series room systems instead of just using their native registration capabilities to fit into that first scenario near the beginning of this article.
VIS will provide connectivity for supported VTCs to both clients and servers. The previous diagram shows the signaling and media flow for a conference hosted on the SfB Front End server by the collocated AVMCU service. VIS is used to proxy the connection and media for VTCs so they can participate directly in the meeting. In the SIP Trunk mode each VTC remains registered to the CUCM infrastructure and then can place calls through CUCM, to the VIS pool, and then on to the Skype for Business Front End pool’s Conference Auto Attendant. There is no drag-and-drop support so SfB users cannot locate a specific VTC and simply drag it into a peer or conference call in an attempt to invite the VTC to the meeting. The VTC must call into the meeting manually by the conference room attendees.
Once in the meeting only a single active video participant can be sent to/from the VTC via VIS, and there is no support for content sharing thus far. This means that the experience from inside the conference room will look a lot like the following image. The Skype for Business and Lync users will receive multiple video participants via the Gallery View in addition to content shared by another desktop client, the same as they would in any normal meeting. Yet when the VTC joins the meeting the attendee will only see the active speaker and will also not receive any of the shared content.
Compare the room system and desktop user experience above, as provided by VIS with what a third-party solution like bridge cascading can provide because they can support multiple streams and content. For example the capabilities of Polycom RealConnect are depicted below which includes bi-directional content sharing and multiple active speaker video participants from Lync appearing on the VTC.
Microsoft’s implementation of H.264 SVC provides multiple simulcast video streams in multiparty conference calls. While Lync 2013 and SfB clients are programmed to send (when requested) these additional streams directly to the Front End server, the legacy VTCs do not have this capability. (Note that native endpoints like LRS and the Polycom Group Series do support these simulcast streams).
In order to retain the flexibility to fulfill different video resolution and frame rate requests across various clients the Front End Server AVMCU needs this to be addressed by VIS. The way this works is that VIS acts as a media transcoding gateway, not just a basic signaling gateway.
The VTC will negotiate an outbound video stream directly to VIS at a specific resolution and frame rate . If the Front End Server AVMCU has any client requests for differing, lower resolutions or frame rates it will then request one or two additional streams. Because the VTC can not provide these additional streams then VIS must create them. VIS itself will transcode and send to the AVMCU up to a maximum of three different video streams, all derived from the single, original stream send by the VTC.
The example above shows a VTC joining a meeting with 3 other Skype for Business endpoints of varying hardware capabilities and conference views. The VTC in this case happens to negotiate and encode a 720p video stream at 30 frames per second to VIS.
- VIS repackages the original H.264 AVC stream into an SVC session understood by the AVMCU which in turn relays it to the laptop participant who happens to have ‘Speaker View’ enabled and thus is requesting full screen high definition video at the full 30 fps.
- VIS will transcode a second stream, downscaling the resolution to 360p as requested by the desktop client which has the default ‘Gallery View’ enabled.
- VIS will also transcode as third stream, downscaling the provided video even further to supply a 180p stream at only 15fps for the mobile device in the conference.
All three of the above streams are simulcast from VIS directly to the AVMCU. If any attendees change the way that video is viewed during the call, leave the meeting, or new participants join then the AVMCU will adjust the requests to VIS in the event that one or more of the additional simulcast streams is no longer required.
VIS must perform this work as the legacy VTCs do not support this capability. For this reason a single VIS server can only support up to a few VTCs simultaneously in a single environment, thus multiple server nodes and even multiple pools may be required to support the transcoding demand which may exist in a specific environment.
One important limitation of this design is that VIS can only transcode H.264 SVC video streams to and from the Skype for Business side of the equation. It does not support transcoding RTV so only Lync 2013 and SfB clients, or any other native device which supports the H.264 SVC implementation in Lync/Skype will be applicable. If a Lync 2010 or Office Communicator client were to join this conference call they would still see the other native participants, who are capable of sending a second RTV stream during the meeting for legacy clients. That additional RTV stream is not sent on to VIS though as it does not support it transcoding it, so that means the VTC will not be able to see the 2010 client’s video and the 2010 client will not see the VTCs video.
An additional limitation is that VIS cannot leverage Edge for media traversal between itself and the VTC environment because the legacy VTCs contain no built-in support for ICE/STUN/TURN as implemented in Skype for Business Server. This means that both CUCM and all VTCs must have the ability to communicate directly with the VIS pool without traversing any network address translation (NAT). The VIS pool does communicate with the Edge server on its SfB-facing side though to establish media sessions with any external or federated Lync and Skype for Business clients, so the imitation is only placed on the VTC side of the network.
Peer to Peer Calling
Peer-to-peer (P2P) calls between the VTCs and other registered Skype for Business or Lync 2013 clients are also supported, although in the initial release only calls placed by a VTC to a SfB user is supported. SfB users cannot call a VTC.
And just as described in the meeting scenario VIS does not support transcoding of RTV so only Lync 2013 clients, Skype for Business clients, or any qualified system which supports Microsoft H.264 SVC can participate in peer calls with supported VTCs.
While VIS does not need to supply additional simulcast streams in a simple two-party call it must still perform some basic transcoding to translate the standards-based H.264 AVC stream sent by the VTC into a n X-H264UC compatible SVC stream that the Lync 2013 or newer clients require for interoperability.
One of the points of this article is to explain not only what VIS is, but also what VIS is not (or at least is not yet).
It does not offer any native capabilities in Skype for Business server to bridge its clients with the vast array of H.323 based conferencing systems, as SIP-only endpoints registered to CUCM is the single supported topology for now. For environments moving in that direction then VIS can provide some capabilities for cross-platform conferences, but it is clear that Microsoft partners providing video interoperability solutions for many years now provide solution sets far beyond the capabilities of VIS, today But as Skype for Business matures one might expect to see additional features and capabilities coming to the platform which will help close some of the gaps. In the meantime, or for the foreseeable future going with a complete end-to-end solution like bridge cascading is one of the ways to finally bring together desktop users and conference rooms in a user experience that is easy to schedule, simple to join, and familiar to all.