The original intent of this article was to review the current list of supported audio and video codecs in Lync 2013 and attempt to explain what each one is used for given that the list has grown quite a bit over time. But due to the latest announcements around Lync and Skype integration it seems appropriate to first take a closer look at one of these codecs in particular before diving into the rest. This article has been in the works for a while and in the meantime fellow MVP Johan Delimon has also posted a brief article covering just the audio codecs in Lync 2013.
At the latest Lync Conference Microsoft released more details regarding further integration plans between Lync 2013 and Skype. While most of this new information was focused around direct video compatibility with Lync 2013 clients, there will be some advancements in audio calling as well. Last year Microsoft added support for peer-to-peer audio calls between Lync 2013 clients and newer Skype clients. This “Version 1” capability was actually provided by use of media transcoding gateways in the Skype cloud which would allow both clients to utilize their own , unique pre-existing audio codecs. The signaling gateways in the Skype cloud would then facilitate the connections between the different clients allowing each to negotiate a media connection with the media gateways. So even though the calls are basic peer-to-peer scenarios the media must still traverse the Skype back-end infrastructure regardless. So in the event that a Skype user is calling a Lync user on the same network the media would take the long way around and effectively be hairpinned back into the same network.
For native Lync connectivity the topic of media traversal is well documented and the value of negotiating a direct media connection can be quite obvious. The Lync to Skype scenario is quite different though as this use-case is more about bridging enterprise and consumer solutions in which it can be argued that the two clients would rarely be on the same network. Either way, as depicted in the following simplified diagram, the signaling and media paths are basically the same in that they both must traverse the entire backend infrastructure.
There is tremendous value in terms of performance and scale to providing a direct media path between these different client, and Microsoft is moving in that direction. What will be coming sometime this year is the addition of a separate deployment of services in the Skype cloud referred to as ‘Version 2’ which will be deploy side-by-side with the current v1 capabilities. The biggest difference in the v2 design is that there will no longer be a need for media transcoding gateways, only the signaling gateways which translate the Session Initiation Protocol (SIP) and Session Description Protocol (SDP) messages between the different clients. This will allow for the SDP of both clients to be used to setup a direct media path by using the same ICE, STUN, and TURN protocol implementation that Lync does to facilitate a peer media connection.
Since media transcoding is no longer utilized then the clients obviously must have at least one audio and one video codec in common. For video Microsoft has opted to integrate into Skype the same H.264 SVC codec which was first introduced in Lync 2013. This codec has been added to the latest release of Skype clients and functions at the same levels of compatibility as the Lync 2013 clients. Both ends will support the same list of resolutions, frame rates, and temporal scaling layers.
For the audio portion of the call Microsoft has gone the opposite route and actually selected the SILK codec already in use in Skype. Unbeknownst to most users Microsoft already added native support for this audio codec to the Lync 2013 desktop client with a Cumulative Update in November 2013 (CU4). When performing some troubleshooting shortly after that release the new codec declaration was seen hiding in the SDP of the captured SIP messages. At that time there was no information on exactly why SILK was included but it was an safe guess as to the possible intent. Although this support has been added in the Lync clients the codec is not actually being used yet as the back-end signaling has not been updated, so currently audio calls between Lync and Skype are still using the v1 media transcoding gateways.
When Microsoft launces the ‘v2’ infrastructure when not only will H.264 SVC video calling be available between Lync and Skype clients but the media paths will be optimized and SILK will be the audio codec of choice.
Viewing Session Description Protocol
To see the actual list of supported codecs in the Lync client the SIP INVITE messages of a call attempt can be captured and reviewed in one of a few ways. A SIP trace could be captured at the server or client level, but the easiest approach is to simply review the tracing log file created by the Lync client when logging is enabled. This requires no access to any Lync servers and the file can be pulled directly from the client workstation.
- Open the Options menu on the Lync 2013 client and on the General tab set the Logging In Lync parameter to either Light or Full.
- Sign-out and completely exit the Lync client. Open the user’s Tracing folder in explorer using the path shown below. If a Lync-UccApi-0.UccApilog file already exists then delete it.
Deleting this file will provide a clean file to view in the next steps to make it much easier to find the desired SIP message. IF tracing was just enabled and was not previously used then this file may not yet even exist.
- Restart the Lync client and note that a new Lync-UccApi-0.UccApilog file will be created.
- Place a video call to another Lync client and hang up a few seconds after it is confirmed that the call was succesfully established. This will populate the new trace file with some SIP messages that include both audio and video codec declarations.
- Open the Lync-UccApi-0.UccApilog file in Notepad and search for the string “application/sdp” to locate the first SIP message containing Session Description Protocol information.
The first result should also show the text “ms-proxy-2007fallback” under the Content-Disposition line just below. This message is for backward compatibility with any clients or endpoints still utilizing the older implementation of ICE (v6). This section will be skipped over as the next section includes the same declarations and the formatting of some parameters is easier to read.
- Select search again to find the second instance in the same SIP INVITE message which will not include the fallback string in the Content-Disposition line. This message contains support for the more recent implementation of ICE (v19).
The c=IN line lists the IP address of the sender of this message. The SIP messages captured during call setup can include the list of supported receiving capabilities from both parties, depending on the process used to capture the trace. Checking this line helps identify which of the two clients in the call this SDP information was sent by.
The m=audio line defines the media profile, or RTP Audio Video Profile (RTV/AVP), which lists each supported codec by their unique Payload Type identifiers. The order of the identifiers defines the order of preference from left (e.g. 117=most preferred) to right (e.g. 101=least preferred) . Scenarios where more than one codec may be applicable, like with poor network conditions or lower bit rate policy limitations, can result in the use of a less-preferred codec.
- Skip past the a=candidate lines which are used to declare all the potential host IP addresses to attempt to negotiate media paths. Look for the first occurrence of the a=rtpmap lines as this is where the individual audio codecs are further defined.
Each unique codec is initially defined by an individual a=ftpmap line while some may also include a secondary a=fmtp line to set additional parameters specific to that codec. The order that the codecs are declared in the text (top to bottom) also matches the order they are defined in the media profile (left to right).
The ‘rtpmap’ attribute defines the type of Real Time Protocol (RTP) media payload for a specific codec. The following text uses the wideband RTAudio.codec as an example.
a=rtpmap:<payload type> <encoding name>/<clock rate>
- The Payload Type which is unique to each different codec variant is defined as a numeric value placed immediately after the colon.
- The Encoding Name is the common name of the codec.
- The Clock Rate is the numeric value after the slash which defines the sampling frequency used for each codec. Values of ‘8000’ typically indicate a narrowband codec while ‘16000’ defines a wideband codec.
The list of audio codecs in Lync 2013 is quite extensive and has grown over the many different releases of the Communications Server product. When looking directly at SIP messages between two Lync 2013 clients the initial SIP INVITE from the calling party will include the following lines below the m=audio section of the SDP messages.
The following sections do not outline the audio codecs in any specific order of preference. They are simply organized in similar groups, starting with the more commonly used codecs.
Microsoft’s own proprietary audio codec which can be licensed for use in other third-party clients and devices.
RTAudio, as with a few other supported audio codecs, is provided in both narrowband (8 kHz) and wideband (16 kHz) options. The wideband codec is most commonly used in peer-to-peer Lync calls while the narrowband option can be used for either peer Lync calls or in some outbound calling scenarios. In the event that available network bandwidth is limited then instead of sending G.711 directly to a Mediation server for outbound media sessions destined for the Public Switched Telephony Network (PSTN) the Lync client can utilize RTA instead. Although this will provide better quality at a lower bit rate over a poor network it will require that the Mediation Server perform decoding and re-encoding tasks on the media session into G.711 for the PSTN side. In scenarios with plenty of local bandwidth the Lync client will typically send G.711 to the Mediation Server (freeing the server from transcoding duties) or if Media Bypass is applicable then G.711 is sent directly to a media gateway.
These entries advertise compatibility with the industry standard G.711 audio codec used throughout the PSTN. Support for two different common Pulse Code Modulation (PCM) algorithms are denoted as PCMU for G.711 µ-Law (used exclusively in North America and Japan) and PCMA for G.711 A-Law (commonly used throughout the rest of the world).
These codecs can be used in numerous calling scenarios but are most commonly seen in calls with PSTN callers. The most common scenario is when placing a Lync audio call to the PSTN where there is plenty of available bandwidth on the network between the client and the mediation server. As of an earlier Office Communicator R2 release the as-designed behavior here is for the client to simply encode the audio in G.711 so that the Mediation Server is not taxed with having to perform any transcoding; it will simply send the media on to its next hop. In the event that local bandwidth is limited and the Lync client is aware of this it may instead opt to encode the audio in Real-Time Audio (RTA) so that the transmission over the network is more efficient (e.g. lower bit rate) and then the Mediation Server will need to decode the RTV session and re-encode it into G.711 for delivery on to the PSTN. Another common scenario for G.711 usage is when Media Bypass is enabled and the Lync client must encode the audio in a format that a media gateway or whatever is on the other side of a SIP Trunk can understand, which would generally not be a Lync-only codec like RTA.
The Siren family of patented codecs was originally developed by Polycom. The specific variant supported by Lync is also known as ‘Siren 7’ and has been used only for conferencing scenarios for some time in the Communications Server product family. An immediate benefit of Siren is that it provides wideband audio at a low bit rate (16 kbps) making it ideal for large multiparty calls where many audio streams are sent to the same Front End server.
Developed specifically for Skype to replace the older SVOPC audio codec and was introduced in the 4.x release of Skype clients. It has also been extended into the Internet standard Opus audio codec.
a=fmtp:103 useinbandfec=1; usedtx=0
a=fmtp:104 useinbandfec=1; usedtx=0
This pair of narrowband and wideband codecs will be used for Lync 2013 and Skype audio calling in the near future when media transcoding is removed from the topology. As mentioned earlier SILK supports in-band FEC, denoted by the ‘useinbandfec=1’ parameter, meaning that any additional error correction media packets are sent inside the same media payload stream.
A freely available and widely popular wideband audio codec which Lync will use in a few scenarios.
Unlike other wideband codec the clock rate here is incorrectly identified as only 8,000Hz even though the actual sampling rate is 16,000KHz. Johan’s aforementioned article explains this is due to an error in RFC 1890 and Lync must declare the rate this way to support compatibly with other systems.
This codec is primarily used in Lync conference calls when no other Lync clients are participating in the same call. In the event that a single Lync client joins an audio conference call populated with only PSTN attendees then the mixed audio sent by the Lync AVMCU to the sole Lync client will be in G.722.
In scenarios where other Lync clients have joined the same conference call then the Lync ACMVU will fallback to using Siren for the mixed audio streams sent to each Lync client. Constrained bandwidth or high-latency scenarios can also trigger a fallback to Siren regardless of the client types in attendance. A previous article from Lync MVP Curtis Johnstone covers this specific behavior in more depth.
While RTA is primarily used for wideband audio between Lync clients when negotiating peer to peer Lync calls, when other clients, like Lync Qualified phones, negotiate calls with Lync clients then G.722 may need to be used if RTA is not available on the phone. (RTA compatibility is not a Lync Qualification requirement for IP phones, but it is included in Optimized phones running Lync Phone Edition.)
This codec declaration is a newer capability added with the RTM version of Lync 2013 and is designed to support Lync Room System devices which are equipped with two microphones for stereo dual-channel audio pickup.
Just as with G.722 the clock rate here is also defined as ‘8000’ yet again the actual rate is 16,000Hz. The ‘/2’ after the clock rate indicates that this codec has 2 separate channels,for stereo applications. Lync Room Systems utilize this codec to provide for improved audio pickup in conference rooms.
Another supported option from the same family of royalty-free wideband audio codecs. It is not based on G.722 though, it is actually a variant of the Siren 7 codec.
G.726 is an Adaptive Differential Pulse Code Modulation (ADPCM) codec designed to more effectively compress speech than older PCM-based codecs. The specific variant supported by Lync 2013 is a single narrowband (32 kbps) option which results in a lower bit rate stream of comparable quality to G.711 audio. Some of the Lync-compatible IP desk phones natively support this codec and in theory could negotiate G.726 instead of G.711 in constrained bandwidth scenarios.
Comfort Noise (CN)
Utilizing Comfort Noise provides Lync the ability to leverage either a narrowband or wideband options for adding white noise during periods of silence to prevent users from mistakenly thinking that the call connection might have been lost.
Redundant Audio Payload (RED)
RED is utilized for any out-of-band Forward Error Correction (FEC) audio payload. While Lync clients will leverage this codec for error correction needs in native calls the Skype clients do not support this and will use in-band FEC.
Dual-Tone Multi-Frequency (DTMF)
DTMF signaling is used to support the common telephone events of pushing buttons on the dial pad while in a call.
The unique tone created by each key is represented by a value between 0 and 16 as defined by the additional fmtp attribute. The name describes exactly how these tones work in that each button on a traditional telephone produces two simultaneous tones of different frequencies. Each row and column on a standard phone number pad uses a different frequency tone so that 8 unique tones can be used to support 16 different dual-tone patterns.
The supported 16 standard tones come from 0-9, *, #, and four additional tones used by the AUTOVON military application defined as A, B, C, D. RFC 4733 defines only 16 unique values (0-15) so it is unclear why Lync defines a 17th value (0-16).
|1209 Hz||1336 Hz||1477 Hz||1633 Hz|
For example depressing ‘4’ will produce two tones of 1209 Hz and 770 Hz. This is why many of the tones seem to sound the same as all keys in a row or column will use the same tone for one of the two played. This harmony is difficult for the average person to easily differentiate between but a computer can accurately identify each of the two frequencies in play.
The list of video codecs in Lync 2013 is much shorter than the list of supported audio codes by comparison.
- Return to the same spot in the tracing file as shown at the beginning of the last section and then search for the m=video line found immediately below the audio section.
Just as with audio codecs the m=video line uses the same format to list the order of preference by defaulting to H.264 SVC (122) ahead of RTVideo.(121).
This is the Real-Time Video codec which has been supported from the introduction of video calling in the Communications Server platform.
The ‘/90000’ value is used as the clock rate for all video codecs advertised by Lync. As denoted by the preference order above this codec is listed second and will be used in the event that both sides do not support H.264 SVC.
As covered in various other articles the H.264 SVC implementation in Lync is used by default for all native 2013 calling scenarios as well in certain third-party interoperability scenarios (e.g. Polycom Group Series with at least 4.1.3 firmware).
When communicating with legacy clients (e.g. Lync 2010, Office Communicator) and some third-party video conferencing systems (e.g. Polycom HDX) then RTVideo would still be used.
As explained in previous articles and various presentations the implementation of H.264 SVC in Lync is not the basic H.264 codec but is specialized in Lync This unique implementation is advertised to other clients as X-H264UC and thus must be understood by the other client in the call equally. In the additional fmtp declaration statement the ‘packetization-mode=1’ parameter indicates that UCConfig Mode 1 is the maximum scaling mode supported which is the ability to encode two separate temporal layers: a base layer and an enhancement layer. As previously stated the upcoming implementation of Skype will support the same mode.
Uneven Level Protection FEC
This codec is actually used to allow Lync 2013 clients to setup a second RTP session used specifically for out-of-band forward error correction data, separate from the main video stream.
As the ‘uneven’ part of the name suggests this codec will send portions of redundant data when needed as it is not an complete duplicate of the main media stream.
Just as mentioned earlier in the audio codecs section Skype clients do not support this and will instead simply used in-band FEC with the single SVC media session. This entry did not exist in previous versions of Lync so in legacy video call scenarios the Lync 2010 clients utilizing RTV will also embed FEC data in-band.