HD Video in Lync 2013
In Office Communications Server 2007 and Lync Server 2010 the usage of high definition video was straightforward. The Real-Time Video (RTV) codec used primarily throughout those client and server versions provides for only a single HD resolution at 720p and was utilized in very specific scenarios, as documented in this previous article.
But with the inclusion of H.264 SVC as the default video codec for Lync 2013 this story changes quite drastically. Foremost there are additional resolutions available in this new codec which provide more options then RTV did, which can actually help soften the importance of high definition video in terms of the user experience anyway, especially in multi-party conference calls. The way that the Lync 2013 user interface was designed provides for high quality, face-to-face video conferencing experience which does not always need to scale up to true ‘high’ resolutions to get the job done.
Throughout this article the term ‘SVC’ will be used to describe behavior related specifically to the implementation of H.264 SVC in Lync 2013 and these concepts and rules do not necessarily apply to the Scalable Video Coding standard.
There is some gray-area in the industry in terms of clarifying exactly what high definition actually means in relation to video. Although there is no actual definition, generally throughout North America it is understood as “any video image with more than 480 horizontal lines”. Throughout the television industry this means resolutions with a height of 720 pixels or higher and this was reflected in RTV when moving between the only standard resolution choice (640×480) and high definition choice (1280×720). But SVC introduces a number of new resolutions including one that falls right between these others: 540p (960×540). By definition this resolution could be considered to be high definition, except for in the television industry. So what is important to understand is that the resolution itself is much less important in SVC due to the variety of supported resolutions (15) versus what RTV provided in the past (5). If a caller looks clear on video and the experience is good then what does it really matter if the resolution in use is 540p or 720p? The point is that the larger list of resolutions available in SVC provides a more granular experience when different resolutions are requested, which is much improved over the older experience of bad/good/great seen in RTV.
The Lync 2013 client contains some hard-coded behavior when using H.264 SVC which determines the maximum video resolutions and frame rates that a given workstation can send and receive. Unlike with RTV this criteria is more complex due to additional features provided by the new SVC codec’s implementation.
Microsoft has also provided much more detail in on how this actually works in Lync 2013. With RTV there was a generic guideline that ‘4 process cores’ were required for OCS or Lync clients to send and receive RTV at 720p resolution. But because SVC in Lync 2013 leverages hardware acceleration then the workstation incapable of utilizing HD with RTV may be capable of sending and/or receiving higher resolutions with SVC.
The Lync Client Video Requirements page in TechNet outlines how SVC leverages different hardware capabilities for video encoding and decoding. There are four primary categories which are used to decide the maximum capabilities of a given workstation:
- Hardware accelerated decoding (DXVA)
- Hardware accelerated encoding (GPU, Camera)
- Physical CPU cores
- Windows Experience Index (WEI) score
Hardware acceleration is a new concept to Lync 2013 as previous releases did not support any type of hardware offloading or assistance for either encoding or decoding of video streams. Be aware that this new capability is specific to the H.264 AVC standard, meaning that it only applies to SVC video sessions in Lync 2013. Video encoding and decoding in RTV is performed completely by the CPU and never leverages a GPU or camera for help. This is true regardless of the client version, even for a Lync 2013 client sending or receiving RTV in an interoperability scenario.
There are two potential options to assist in encoding SVC video and one possibility to assist with decoding SVC video. The most common capability today would be the ability to offload SVC decoding to a workstation’s graphics chipset which supports DirectX Video Acceleration (DXVA).
Less common would be the ability to offload encoding of the video to either the graphics chipset or the camera itself. Currently there are a limited number of graphics chipsets which can be leveraged by the Lync 2013 client as well as a single USB camera on the market today which can provide hardware offloading for video encoding in SVC.
Most modern workstations should be able to utilize their graphics chipset to assist in decoding inbound video streams in SVC. These could be either integrated chipsets commonly found on laptops or lower-end workstation or dedicated video cards typically used in higher-end workstations. The requirements are that at least DirectX 9.0 is available and that the chipset exposes the DXVA2_ModeH264_VLD_NoFGT decoding mode. The complex GUID for this decoding mode simply means that the chipset supports H.264 Variable Length Decoding (VLD) without Film Grain Technology (FGT). The specifications for DXVA can be found in this article published by Microsoft.
To determine if a workstation supports both of these requirements the DirectX Diagnostic Tool provided in Windows operating systems can be used. Two example workstations are used throughout this article: ‘A’ is an older Lenovo T410 laptop equipped with a dual-core Intel Core i5 mobile processor and ‘B’ is a newer desktop workstation equipped with a quad-core Intel Core i7 desktop processor.
- Run dxdiag.exe and wait for the tool to complete the automatic data gathering process. If this is the first time the tool has been run then it may prompt to check for new WHQL certificates from the Internet, which is recommended.
When finished the System tab will display some basic information about the current workstation including the DirectX version number. In example workstation ‘A’ shown below DirectX 11 is installed which meets the first requirement of at least version 9.0.
- Click the Save All Information button and then save the dxdiag.txt output file.
- Open the saved file in Notepad and search for the string DVXA which should advance the file directly to the DXVA2 Modes header.
The various DVXA2 modes supported by the workstation will be listed. If the ‘DXVA2_ModeH264_VLD_NoFGT’ mode is included in the list then the second requirement has also been met. If this string is not included then the workstation cannot utilize the graphics processor for video decoding for SVC video in Lync 2013.
As shown above the example workstation does support both DXVA requirements thus when the Lync 2013 client needs to decode incoming SVC video it can leverage the graphics chipset.
Graphics Chipset Encoding
A graphical processing unit (GPU) can also be used for performing video encoding tasks but Lync 2013 currently supports only a limited set of chipsets which can provide this. The TechNet documentation lists individual chipset models as well specific driver versions for Intel and AMD graphics chipsets, but identifying which products contain these chipsets can be a bit difficult.
Certain models of Intel’s second generation Sandy Bridge and third generation Ivy Bridge processors include the integrated Intel HD Graphics chipsets. These can support a feature called Quick Sync Video which Lync 2013 utilizes for hardware encoding. The second generation CPUs include either the Intel HD Graphics 2000 or 3000 chipsets while the third generation CPUs leverage newer 2500 or 4000 chipsets. This means that some, but not all of the Intel Core i3, i5, i7, and i7 Extreme processors may be able to handle SVC video encoding tasks. Also understand that while some workstations may contain a supported processor core, if a unsupported secondary graphics card is installed and active then the integrated graphics chipset is not available and cannot be used for hardware acceleration. This is common in laptop platforms with dedicated GPUs.
Lync 2013 also supports encoding with some AMD products utilizing their Video Codec Engine (VCE) feature. This feature, like Intel’s Quick Sync Video, provides for hardware encoding of H.264 video in Lync 2013 and is part of their Unified Video Decoder (UVD) solution. AMD integrates VCE features in their Radeon 7000 Series graphics cards as well as in some of their Trinity A Series CPUs.
To find out if a workstation contains one of the supported chipsets the DirectX Diagnostic Tool can be used to start the process of identifying the processor core and active graphics chipset.
- Return to the System tab of dxdiag.exe or open the dxdiag.txt file and search for ‘Processor’ to identify the CPU type in the workstation. (Note that Windows does often not correctly report physical core count as it will reflect the total thread count.)
Example Workstation ‘A’
Processor: Intel(R) Core(TM) i5 CPU M 540 @ 2.53GHz (4 CPUs), ~2.5GHz
Example Workstation ‘B’
Processor: Intel(R) Core(TM) i7-2715QE CPU @ 2.10GHz (8 CPUs), ~2.1GHz
- Simply search the Internet using specific keywords from the processor description to locate the product documentation to help determine the processor generation and specifications.
The first result on Bing for example workstation ‘A’ is the product specification page for the Intel Core i5-540M processor which lists the integrated Processor Graphics as Intel HD Graphics. That name is how Intel refers to its first generation integrated chipset, so this is not one of the compatible chipsets available for hardware acceleration as the CPU is an older first-generation Arrandale chipset.
Example Workstation ‘A’
Next is an excerpt from the product specification page for the second-generation Intel Core i7-2715QE processor used in example workstation ‘B’. This second-generation Sandy Bridge processor contains the 3000 series integrated graphics chipset which includes the Quick Sync Video feature.
Example Workstation ‘B’
Example workstation ‘A’ does not include a hardware acceleration supported graphics chipset, while workstation ‘B’ does. But even though the processor in workstation ‘B’ contains the supported chipset it is still possible that a secondary graphics chipset is installed in the workstation, so to verify that the DirectX Diagnostic Tool can be used yet again.
- Switch to the Display tab of dxdiag.exe and check the name and manufacturer of the enabled video device.
Example Workstation ‘A’
Example Workstation ‘B’
In example workstation ‘A’ the Intel HD Graphics chipset is even not being used as a separate Nvidia chipset is included in the laptop, so even if it did include a supported integrated graphics chipset it would even matter. Yet example workstation ‘B’ not only includes a supported chipset, but it is also actively utilized in the system.
Thus workstation ‘A’ does not meet the requirements to support video encoding via the graphics processor while workstation ‘B’ does.
USB Camera Encoding
The final option is that Lync 2013 can leverage compatible USB cameras to perform hardware acceleration for encoding of SVC video as well. This is a interesting option as although on cannot simply upgrade the type of processor or graphics chipset in a laptop it is certainly easy to purchase a supported USB device and connect it to the workstation for an instant upgrade in video encoding capabilities.
Although some of the USB webcams currently Optimized for Lync do support encoding up to 1080p resolution there is currently only a single camera on the market today which actually handles hardware offloading of the encoding: the Logitech C930e. This is the first device to support USB Video Class (UVC) 1.5 leveraged by Windows 8 and Lync 2013. This means that as long as the workstation is running Windows 8 and Lync 2013 this camera can perform the video encoding itself to minimize the load on the workstation’s CPU during SVC video calls.
This is a simple concept and still plays a part in video decoding and encoding capabilities in Lync 2013 for SVC, similar to RTV although not as simple. With RTV four physical processor cores were required to encode high definition video (at up to 30fps), while two cores was sufficient to decode high definition video (at a maximum of 15fps). In reality though Lync-to-Lync video calls required quad-core systems on both ends as the Lync 2010 client application is hardcoded to send and receive the same resolutions. (For example a Quad-core to Dual-core Lync RTV video session would be VGA in both directions, not HD in one and VGA in the other, as the Lync client prefers to send VGA at 30fps instead of 720p at 15fps. Microsoft hardcoded a client preference for motion over sharpness in this scenario.)
But in Lync 2013 with SVC the capabilities are much more granular, and with the help of hardware acceleration it is possible for systems with less than 4 cores to send and/or receive HD video.
- The same chipset specification documents found in the previous step also show the number of physical cores for each example workstation.
Example Workstation ‘A’
Example Workstation ‘B’
Example workstation ‘A’ is a dual-core processor, yet capable of up to four threads. This is why the information shown earlier in the DirectX Diagnostic Tool reported a total of ‘4 CPUs’. A true quad-core system with hyper-threading would typically be reported as having 8 CPUs, as evident in example workstation ‘B’.
Windows Experience Index
The final piece of criteria used in calculating a workstation’s video encoding and decoding capabilities is the Windows Experience Index (WEI) score; specifically the Video Encoding Score. The maximum value for this score in Windows 7 is limited to 7.9 while a Windows 8 workstation can be as high as 9.9. These values will be important in the next section of this article when looking at the different maximum capabilities charts.
- To identify the applicable WEI score launch the Windows System Control Panel (tip: use the handy Windows Key + Pause/Break keyboard shortcut) and then verify that the assessment test has previously been run and a score is shown.
- If the assessment has never been run on the workstation or if the hardware configuration has changed since the last test then it is recommended to re-run the test by advancing to the Performance and Information Tools control panel and selecting Re-run the assessment.
- One a test run has been completed then using Windows Explorer open the following system directory where the assessment results files are stored and locate the most recent Formal.Assessment file.
- Open the file (e.g. 2013-05-04 22.214.171.1249 Formal.Assessment (Recent).WinSAT.xml) and search for the string VideoEncodeScore.
Example Workstation ‘A’
Example Workstation ‘B’
Note that the VideoEncodeScore values are reported as 7.1 for workstation ‘A’ and 7.6 for workstation ‘B’.
The previous article HD Video in Lync included a peek into identifying the receive capabilities of a Lync 2010 client using RTV for video. RTV exposes these options in the Session Description Protocol (SDP) portion of messages SIP messages so it is easy to find these by capturing a SIP trace of the call negotiation. RTV is still negotiated in the same manner with Lync 2013 exposing some new capabilities as well. But SVC calls in Lync 2013 are handled in a completely different manner so it is not possible to identify either endpoint’s receive capabilities as these are not advertised in SDP.
Starting with RTV there is an interesting discovery when looking at the same SDP information which was described in the previous article. In all previous OCS and Lync clients the RTV video capabilities were limited to the following possible resolutions and frame rate options. (Based on hardware capabilities clients would advertise support to receive a subset of this these, not the entire list.)
- QCIF (176×144) [4:3] 15fps
- CIF (352×288) [4:3] 15fps
- VGA (640×480) [4:3] 15fps (single core) or 30fps (dual/quad core)
- HD (1280×720) [16:9] 13fps (dual core) or 30fps (quad core)
But when looking at the SIP trace of a video call between Lync 2013 clients there is a change in capabilities advertised by RTV. Even though a Lync 2013 to 2013 client video call will utilize H.264/SVC for video the two clients will still share information about each codec they support.
As explained in the previous article the a=x-caps:121 declaration in SDP is used to advertise the client’s receiving capabilities for RTV. This has nothing to do with SVC as that codec does not utilize the a=x-caps parameter. The MSDN documentation for x-caps has been updated for version 2.0 of SDP in Lync 2013 and last paragraph in the article states that “a=x-caps attribute is not supported for the H.264UC…media formats…if present in a received SDP message, MUST be ignored”.
The following RTV declaration was captured from a Lync 2010 client:
Yet this RTV declaration was captured from a Lync 2013 client:
Notice that the 2013 client advertises three additional resolutions for RTV, including 1080p.
That MSDN documentation includes a (partially accurate) list of supported resolutions for RTV, including the addition of 1080p. What is missing from that list currently are the additional 640×360 and 424×240 options identified above. Yet the a=x-caps example at the end of that article does show the exact same list shown here, which both comes directly from a SIP trace.
It appears that RTV has been improved to support three additional widescreen aspect ratios (16:9) across low, standard, and high definition. By default the only time RTV would be used for video streams in Lync 2013 for either peer-to-peer or conference calls is when an older RTV-only client is involved. A Lync 2013 to 2013 video session would always utilize H.264 SVC for video, yet a Lync 2010 to Lync 2013 video session would have to use RTV. But the Lync 2010 client does not currently support these additional resolutions, so when negotiating a video session with a 2013 client it would never advertise these new options as receiving capabilities anyway. In turn the 2013 client would only be able to send video in one of the earlier four formats, as well as receive video in one of these same four formats as those are all the 2010 client is aware of.
Where these new resolution could be seen in action is if the H.264/SVC codec was disabled on the Lync 2013 AVMCU, which would revert the AVMCU to using only RTV for all video streams.
As previously mentioned there is no simple way to identify the supported receive capabilities when SVC is used for video as it is not advertised in SDP like RTV does. A deeper dive into this topic is planned for a later article as the capabilities provided by SVC in Lync is much more complex than RTV.
In the article Video Interoperability in Lync 2013 the specifications of the complete H.264 SVC codec adopted by Microsoft were covered in great detail, yet was written prior to the actual release of Lync Server 2013. The implementation of SVC in the 2013 product release does not actually utilize every possible feature and capability defined in the codec specification though. That article identified up to 18 different resolutions defined in the specification, but according to the table included on this TechNet page only 15 are in use in the product. The following table captures the different resolutions, aspect ratios, and maximum supported frame rates for each.
Common Name Resolution Aspect Ratio Max Frame Rate 212×160 4:3 15 fps 180p 320×180 16:9 15 fps CIF 320×240 4:3 15 fps 240p 424×240 16:9 15 fps 424×320 4:3 15 fps 270p 480×270 16:9 15 fps 360p 640×360 16:9 30 fps VGA (SD) 640×480 4:3 30 fps 480p 848×480 16:9 30 fps 540p 960×540 16:9 30 fps 720p (HD) 1280×720 16:9 30 fps 1080p (HD) 1920×1024 16:9 30 fps 144p (Panorama) 960×144 20:3 30 fps 192p (Panorama) 1280×192 20:3 30 fps 288p (Panorama) 1920×288 20:3 30 fps
Note that there are two options for the same vertical size in a few cases (e.g. 640×480 & 640×360) which still provide native 4:3 resolution for older non-HD cameras. In most cases the optical resolution of the camera will drive the choice as to best match the field of view captured by the camera. Older standard definition cameras are often only capable of capturing a 4:3 field of view as 16:9 widescreen images typically require a high-definition camera.
Also be aware that after extensive testing not all of these resolutions appear to be usable in Lync video calls, and there are even some undocumented resolutions (e.g. 352×288) which can appear when dealing with some mobile devices (e.g. iPhone) using different native screen or camera resolutions. A later article will cover this topic in more depth.
Verifying HD Resolution Support
In previous versions it was required to manually enable high definition video as an option as OCS and Lync 2010 were set to limit the maximum RTV resolution to VGA by default.
In Lync Server 2013 manual configuration is still required to enable high definition resolution for RTV sessions by setting the MaxVideoRateAllowed parameter is set to Hd720p15M.
Set-CsMediaConfiguration –Identity Global -MaxVideoRateAllowed Hd720p15M
On the other hand SVC is already allowed to scale all the way up to 1080p resolution without any modification as the VideoBitRateKb and TotalRecevieVideoBitRateKb parameters are both set to 50000 by default.
Get-CsConferencingPolicy | Select-Object *Video*
Determining Client Capabilities
In the previous article this section walked through validating the actual resolutions and frame rates that the Lync client would advertise during RTV negotiation, but as mentioned numerous times in this article that process is not applicable to SVC. But using the information gathered from the two example workstations the SVC encoding and decoding capabilities can be determined using the tables at the end of the following TechNet documentation.
Before looking at the data in the table it would be helpful to summarize the basic categories and then compare them to past behavior with RTV. Basically there are three different general categories a workstation may fall into: (1) no hardware acceleration, (2) hardware accelerated decoding, (3) both hardware accelerated decoding and encoding.
For systems which do not support any type of hardware decoding or encoding then it is not possible to encode HD video unless the processor contains at least 4 cores. It is possible to decode HD video with only 2 processor cores available, but only if the WEI VideoEncodeScore is 4.5 or higher. This is similar behavior to RTV which requires a quad-core processor to encode HD video but a dual-core system is capable of decoding HD video. Workstations lacking any hardware acceleration capabilities will never be able to encode video at 1080p regardless of the number of processor cores or the WEI score.
The second category includes most modern workstations which will have the ability to leverage hardware acceleration for decoding video. This provides even the least powerful workstations with a single processor core the ability to decode up to 1080p, given that the WEI score is at least a paltry value of 3.0. Yet the ability to encode HD video still requires at least a quad-core processor since there is still no hardware assistance on the encoding side, placing the entire encoding burden on the CPU.
The third category includes systems which are able to leverage hardware acceleration for both encoding and decoding video and these workstations can decode and encode every possible resolution provided in H.264 SVC all the way up to 1080p regardless of the number of processors. The encoding score still is a factor here, so older systems with a score less than 5.0 can still achieve HD send and receive by adding a UVC 1.5 compatible camera, for example but would be limited to encoding 720p video while receiving 1080p. Systems with a score higher than 5.0 would be able to send and receive 1080p.
The information previously gathered from the example workstations is summarized below and then cross-referenced with the TechNet documentation.
Feature Workstation ‘A’ Workstation ‘B’ Hardware accelerated decoding Yes Yes Hardware accelerated encoding No Yes Physical CPU cores 2 4 WEI Video Encode Score 7.1 7.6
Example workstation ‘A’ falls into the second category of hardware accelerated decoding only, and with a dual core processor and an encoding score of at least 6.0 that system is capable of sending resolutions up to 960×540 and receiving and decoding all resolutions all the way up to 1920×1080.
Meanwhile the more powerful workstation ‘B’ which includes 4 processors and can utilize hardware acceleration for both decoding and encoding video is capable of sending and receiving a maximum resolution of 1920×1080.
After all these calculations and reference cross-checking the resulting information is simply ‘what is possible’. Keep in mind that for any resolution to be sent to another endpoint in a call that endpoint first much request that specific resolution or something close to it. So if the receiver’s display resolution is not even large enough then it does not matter if the other system can support sending a much higher resolution than the requestor is capable of asking for. For these reasons utilizing something like 1080p video is going to require a fair amount of hardware in both workstations with the receiving end connected to a decent-sized monitor set to at least 1920×1280 screen resolution and viewing the video stream in full screen. In retrospect the real value of Lync 2013 in terms of an improved video conferencing experience is not necessarily due to the increased high definition capabilities, but is a result of the additional standard definition resolutions and improved video quality provided by H.264/SVC.
An upcoming article will put all this theory to test by reviewing the results of various calls among different endpoints, leveraging the Lync Monitoring and Reporting components.