The original intent of this article was to review the current list of supported audio and video codecs in Lync 2013 and attempt to explain what each one is used for given that the list has grown quite a bit over time. But due to the latest announcements around Lync and Skype integration it seems appropriate to first take a closer look at one of these codecs in particular before diving into the rest.  This article has been in the works for a while and in the meantime fellow MVP Johan Delimon has also posted a brief article covering just the audio codecs in Lync 2013.

Background

At the latest Lync Conference Microsoft released more details regarding further integration plans between Lync 2013 and Skype. While most of this new information was focused around direct video compatibility with Lync 2013 clients, there will be some advancements in audio calling as well. Last year Microsoft added support for peer-to-peer audio calls between Lync 2013 clients and newer Skype clients. This “Version 1” capability was actually provided by use of media transcoding gateways in the Skype cloud which would allow both clients to utilize their own , unique pre-existing audio codecs. The signaling gateways in the Skype cloud would then facilitate the connections between the different clients allowing each to negotiate a media connection with the media gateways. So even though the calls are basic peer-to-peer scenarios the media must still traverse the Skype back-end infrastructure regardless. So in the event that a Skype user is calling a Lync user on the same network the media would take the long way around and effectively be hairpinned back into the same network.

For native Lync connectivity the topic of media traversal is well documented and the value of negotiating a direct media connection can be quite obvious. The Lync to Skype scenario is quite different though as this use-case is more about bridging enterprise and consumer solutions in which it can be argued that the two clients would rarely be on the same network.  Either way, as depicted in the following simplified diagram, the signaling and media paths are basically the same in that they both must traverse the entire backend infrastructure.

image

There is tremendous value in terms of performance and scale to providing a direct media path between these different client, and Microsoft is moving in that direction.  What will be coming sometime this year is the addition of a separate deployment of services in the Skype cloud referred to as ‘Version 2’ which will be deploy side-by-side with the current v1 capabilities. The biggest difference in the v2 design is that there will no longer be a need for media transcoding gateways, only the signaling gateways which translate the Session Initiation Protocol (SIP) and Session Description Protocol (SDP) messages between the different clients. This will allow for the SDP of both clients to be used to setup a direct media path by using the same ICE, STUN, and TURN protocol implementation that Lync does to facilitate a peer media connection.

image

Since media transcoding is no longer utilized then the clients obviously must have at least one audio and one video codec in common. For video Microsoft has opted to integrate into Skype the same H.264 SVC codec which was first introduced in Lync 2013. This codec has been added to the latest release of Skype clients and functions at the same levels of compatibility as the Lync 2013 clients. Both ends will support the same list of resolutions, frame rates, and temporal scaling layers.

For the audio portion of the call Microsoft has gone the opposite route and actually selected the SILK codec already in use in Skype. Unbeknownst to most users Microsoft already added native support for this audio codec to the Lync 2013 desktop client with a Cumulative Update in November 2013 (CU4). When performing some troubleshooting shortly after that release the new codec declaration was seen hiding in the SDP of the captured SIP messages. At that time there was no information on exactly why SILK was included but it was an safe guess as to the possible intent. Although this support has been added in the Lync clients the codec is not actually being used yet as the back-end signaling has not been updated, so currently audio calls between Lync and Skype are still using the v1 media transcoding gateways.

When Microsoft launces the ‘v2’ infrastructure when not only will H.264 SVC video calling be available between Lync and Skype clients but the media paths will be optimized and SILK will be the audio codec of choice.

Viewing Session Description Protocol

To see the actual list of supported codecs in the Lync client the SIP INVITE messages of a call attempt can be captured and reviewed in one of a few ways.  A SIP trace could be captured at the server or client level, but the easiest approach is to simply review the tracing log file created by the Lync client when logging is enabled.  This requires no access to any Lync servers and the file can be pulled directly from the client workstation.

  • Open the Options menu on the Lync 2013 client and on the General tab set the Logging In Lync parameter to either Light or Full.

image

  • Sign-out and completely exit the Lync client.  Open the user’s Tracing folder in explorer using the path shown below.  If a Lync-UccApi-0.UccApilog file already exists then delete it. 

%localappdata%\Microsoft\Office\15.0\Lync\Tracing

image

Deleting this file will provide a clean file to view in the next steps to make it much easier to find the desired SIP message.  IF tracing was just enabled and was not previously used then this file may not yet even exist.

  • Restart the Lync client and note that a new Lync-UccApi-0.UccApilog file will be created.
  • Place a video call to another Lync client and hang up a few seconds after it is confirmed that the call was succesfully established.  This will populate the new trace file with some SIP messages that include both audio and video codec declarations.
  • Open the Lync-UccApi-0.UccApilog file in Notepad and search for the string “application/sdp” to locate the first SIP message containing Session Description Protocol information.

image

The first result should also show the text “ms-proxy-2007fallback” under the Content-Disposition line just below.  This message is for backward compatibility with any clients or endpoints still utilizing the older implementation of ICE (v6).  This section will be skipped over as the next section includes the same declarations and the formatting of some parameters is easier to read.

  • Select search again to find the second instance in the same SIP INVITE message which will not include the fallback string in the Content-Disposition line.  This message contains support for the more recent implementation of ICE (v19).

image

The c=IN line lists the IP address of the sender of this message.  The SIP messages captured during call setup can include the list of supported receiving capabilities from both parties, depending on the process used to capture the trace.  Checking this line helps identify which of the two clients in the call this SDP information was sent by.

The m=audio line defines the media profile, or RTP Audio Video Profile (RTV/AVP), which lists each supported codec by their unique Payload Type identifiers.  The order of the identifiers defines the order of preference from left (e.g. 117=most preferred) to right (e.g. 101=least preferred) . Scenarios where more than one codec may be applicable, like with poor network conditions or lower bit rate policy limitations, can result in the use of a less-preferred codec.

  • Skip past the a=candidate lines which are used to declare all the potential host IP addresses to attempt to negotiate media paths.  Look for the first occurrence of the a=rtpmap lines as this is where the individual audio codecs are further defined.

image

Each unique codec is initially defined by an individual a=ftpmap line while some may also include a secondary a=fmtp line to set additional parameters specific to that codec.  The order that the codecs are declared in the text (top to bottom) also matches the order they are defined in the media profile (left to right).

The ‘rtpmap’ attribute defines the type of Real Time Protocol (RTP) media payload for a specific codec.  The following text uses the wideband RTAudio.codec as an example.

a=rtpmap:<payload type> <encoding name>/<clock rate>

a=rtpmap:114 x-msrta/16000

  • The Payload Type which is unique to each different codec variant is defined as a numeric value placed immediately after the colon.
  • The Encoding Name is the common name of the codec.

  • The Clock Rate is the numeric value after the slash which defines the sampling frequency used for each codec. Values of ‘8000’ typically indicate a narrowband codec while ‘16000’ defines a wideband codec.

Audio Codecs

The list of audio codecs in Lync 2013 is quite extensive and has grown over the many different releases of the Communications Server product.  When looking directly at SIP messages between two Lync 2013 clients the initial SIP INVITE from the calling party will include the following lines below the m=audio section of the SDP messages.

image

The following sections do not outline the audio codecs in any specific order of preference.  They are simply organized in similar groups, starting with the more commonly used codecs.

RTAudio (RTA)

Microsoft’s own proprietary audio codec which can be licensed for use in other third-party clients and devices.

a=rtpmap:115 x-msrta/8000

a=rtpmap:114 x-msrta/16000

RTAudio, as with a few other supported audio codecs, is provided in both narrowband (8 kHz) and wideband (16 kHz) options.  The wideband codec is most commonly used in peer-to-peer Lync calls while the narrowband option can be used for either peer Lync calls or in some outbound calling scenarios.  In the event that available network bandwidth is limited then instead of sending G.711 directly to a Mediation server for outbound media sessions destined for the Public Switched Telephony Network (PSTN) the Lync client can utilize RTA instead.  Although this will provide better quality at a lower bit rate over a poor network it will require that the Mediation Server perform decoding and re-encoding tasks on the media session into G.711 for the PSTN side.  In scenarios with plenty of local bandwidth the Lync client will typically send G.711 to the Mediation Server (freeing the server from transcoding duties) or if Media Bypass is applicable then G.711 is sent directly to a media gateway.

G.711

These entries advertise compatibility with the industry standard G.711 audio codec used throughout the PSTN. Support for two different common Pulse Code Modulation (PCM) algorithms are denoted as PCMU for G.711 µ-Law (used exclusively in North America and Japan) and PCMA for G.711 A-Law (commonly used throughout the rest of the world).

a=rtpmap:0 PCMU/8000

a=rtpmap:8 PCMA/8000

These codecs can be used in numerous calling scenarios but are most commonly seen in calls with PSTN callers. The most common scenario is when placing a Lync audio call to the PSTN where there is plenty of available bandwidth on the network between the client and the mediation server. As of an earlier Office Communicator R2 release the as-designed behavior here is for the client to simply encode the audio in G.711 so that the Mediation Server is not taxed with having to perform any transcoding; it will simply send the media on to its next hop. In the event that local bandwidth is limited and the Lync client is aware of this it may instead opt to encode the audio in Real-Time Audio (RTA) so that the transmission over the network is more efficient (e.g. lower bit rate) and then the Mediation Server will need to decode the RTV session and re-encode it into G.711 for delivery on to the PSTN. Another common scenario for G.711 usage is when Media Bypass is enabled and the Lync client must encode the audio in a format that a media gateway or whatever is on the other side of a SIP Trunk can understand, which would generally not be a Lync-only codec like RTA.

Siren

The Siren family of patented codecs was originally developed by Polycom.  The specific variant supported by Lync is also known as ‘Siren 7’ and has been used only for conferencing scenarios for some time in the Communications Server product family.  An immediate benefit of Siren is that it provides wideband audio at a low bit rate (16 kbps) making it ideal for large multiparty calls where many audio streams are sent to the same Front End server.

a=rtpmap:111 SIREN/16000

SILK

Developed specifically for Skype to replace the older SVOPC audio codec and was introduced in the 4.x release of Skype clients. It has also been extended into the Internet standard Opus audio codec.

a=rtpmap:103 SILK/8000
a=fmtp:103 useinbandfec=1; usedtx=0

a=rtpmap:104 SILK/16000
a=fmtp:104 useinbandfec=1; usedtx=0

This pair of narrowband and wideband codecs will be used for Lync 2013 and Skype audio calling in the near future when media transcoding is removed from the topology.  As mentioned earlier SILK supports in-band FEC, denoted by the ‘useinbandfec=1’ parameter, meaning that any additional error correction media packets are sent inside the same media payload stream.

G.722

A freely available and widely popular wideband audio codec which Lync will use in a few scenarios.

a=rtpmap:9 G722/8000

Unlike other wideband codec the clock rate here is incorrectly identified as only 8,000Hz even though the actual sampling rate is 16,000KHz.  Johan’s aforementioned article explains this is due to an error in RFC 1890 and Lync must declare the rate this way to support compatibly with other systems.

This codec is primarily used in Lync conference calls when no other Lync clients are participating in the same call.  In the event that a single Lync client joins an audio conference call populated with only PSTN attendees then the mixed audio sent by the Lync AVMCU to the sole Lync client will be in G.722.

In scenarios where other Lync clients have joined the same conference call then the Lync ACMVU will fallback to using Siren for the mixed audio streams sent to each Lync client.  Constrained bandwidth or high-latency scenarios can also trigger a fallback to Siren regardless of the client types in attendance. A previous article from Lync MVP Curtis Johnstone covers this specific behavior in more depth.

While RTA is primarily used for wideband audio between Lync clients when negotiating peer to peer Lync calls, when other clients, like Lync Qualified phones, negotiate calls with Lync clients then G.722 may need to be used if RTA is not available on the phone. (RTA compatibility is not a Lync Qualification requirement for IP phones, but it is included in Optimized phones running Lync Phone Edition.)

G.722 Stereo

This codec declaration is a newer capability added with the RTM version of Lync 2013 and is designed to support Lync Room System devices which are equipped with two microphones for stereo dual-channel audio pickup.

a=rtpmap:117 G722/8000/2

Just as with G.722 the clock rate here is also defined as ‘8000’ yet again the actual rate is 16,000Hz. The ‘/2’ after the clock rate indicates that this codec has 2 separate channels,for stereo applications.  Lync Room Systems utilize this codec to provide for improved audio pickup in conference rooms.

G.722.1

Another supported option from the same family of royalty-free wideband audio codecs. It is not based on G.722 though, it is actually a variant of the Siren 7 codec.

a=rtpmap:112 G7221/16000

G.726

G.726 is an Adaptive Differential Pulse Code Modulation (ADPCM) codec designed to more effectively compress speech than older PCM-based codecs.  The specific variant supported by Lync 2013 is a single narrowband (32 kbps) option which results in a lower bit rate stream of comparable quality to G.711 audio.  Some of the Lync-compatible IP desk phones natively support this codec and in theory could negotiate G.726 instead of G.711 in constrained bandwidth scenarios.

a=rtpmap:116 AAL2-G726-32/8000

Comfort Noise (CN)

Utilizing Comfort Noise provides Lync the ability to leverage either a narrowband or wideband options for adding white noise during periods of silence to prevent users from mistakenly thinking that the call connection might have been lost.

a=rtpmap:13 CN/8000

a=rtpmap:118 CN/16000

Redundant Audio Payload (RED)

RED is utilized for any out-of-band Forward Error Correction (FEC) audio payload.  While Lync clients will leverage this codec for error correction needs in native calls the Skype clients do not support this and will use in-band FEC.

a=rtpmap:97 RED/8000

Dual-Tone Multi-Frequency (DTMF)

DTMF signaling is used to support the common telephone events of pushing buttons on the dial pad while in a call.

a=rtpmap:101 telephone-event/8000
a=fmtp:101 0-16

The unique tone created by each key is represented by a value between 0 and 16 as defined by the additional fmtp attribute.  The name describes exactly how these tones work in that each button on a traditional telephone produces two simultaneous tones of different frequencies.  Each row and column on a standard phone number pad uses a different frequency tone so that 8 unique tones can be used to support 16 different dual-tone patterns.

The supported 16 standard tones come from 0-9, *, #, and four additional tones used by the AUTOVON military application defined as A, B, C, D.  RFC 4733 defines only 16 unique values (0-15) so it is unclear why Lync defines a 17th value (0-16).

1 2 3 A 697 Hz
4 5 6 B 770 Hz
7 8 9 C 852 Hz
* 0 # D 941 Hz
1209 Hz 1336 Hz 1477 Hz 1633 Hz  

 

For example depressing ‘4’ will produce two tones of 1209 Hz and 770 Hz.  This is why many of the tones seem to sound the same as all keys in a row or column will use the same tone for one of the two played.  This harmony is difficult for the average person to easily differentiate between but a computer can accurately identify each of the two frequencies in play.

Video Codecs

The list of video codecs in Lync 2013 is much shorter than the list of supported audio codes by comparison.

  • Return to the same spot in the tracing file as shown at the beginning of the last section and then search for the m=video line found immediately below the audio section.

image

Just as with audio codecs the m=video line uses the same format to list the order of preference by defaulting to H.264 SVC (122) ahead of RTVideo.(121).

image

RTVideo (RTV)

This is the Real-Time Video codec which has been supported from the introduction of video calling in the Communications Server platform.

a=rtpmap:121 x-rtvc1/90000

The ‘/90000’ value is used as the clock rate for all video codecs advertised by Lync.  As denoted by the preference order above this codec is listed second and will be used in the event that both sides do not support H.264 SVC.

H.264 SVC

As covered in various other articles the H.264 SVC implementation in Lync is used by default for all native 2013 calling scenarios as well in certain third-party interoperability scenarios (e.g. Polycom Group Series with at least 4.1.3 firmware).

a=rtpmap:122 X-H264UC/90000
a=fmtp:122 packetization-mode=1;mst-mode=NI-TC

When communicating with legacy clients (e.g. Lync 2010, Office Communicator) and some third-party video conferencing systems (e.g. Polycom HDX) then RTVideo would still be used.

As explained in previous articles and various presentations the implementation of H.264 SVC in Lync is not the basic H.264 codec but is specialized in Lync This unique implementation is advertised to other clients as X-H264UC and thus must  be understood by the other client in the call equally. In the additional fmtp declaration statement the ‘packetization-mode=1’ parameter indicates that UCConfig Mode 1 is the maximum scaling mode supported which is the ability to encode two separate temporal layers: a base layer and an enhancement layer. As previously stated the upcoming implementation of Skype will support the same mode.

Uneven Level Protection FEC

This codec is actually used to allow Lync 2013 clients to setup a second RTP session used specifically for out-of-band forward error correction data, separate from the main video stream.

a=rtpmap:123 x-ulpfecuc/90000

As the ‘uneven’ part of the name suggests this codec will send portions of redundant data when needed as it is not an complete duplicate of the main media stream.

Just as mentioned earlier in the audio codecs section Skype clients do not support this and will instead simply used in-band FEC with the single SVC media session.  This entry did not exist in previous versions of Lync so in legacy video call scenarios the Lync 2010 clients utilizing RTV will also embed FEC data in-band.

By Jeff Schertz

Site Administrator

26 thoughts on “Media Codecs in Lync 2013”
  1. It is great info, as usual in this blog. But when v2 of Skype connectivity will be available in public? Also, are v2 will need MSFT-linked account in Skype as v1? In my long-long contact list in Skype only couple of users have accounts linked with LiveID and that's why this feature is useless for me and my colleagues. Also even with linked accounts there are strange behaviors. If linked LiveID is somebody@hotmail.com, it works ok. If LiveID is somebody@mail.ru, Skype-Lync calls works, but IM not (because non-working Presense between them).

    1. Microsoft stated 'later this year' in the Lync Conference Keynote in relation to the Skype upgrade. Presence and audio calling works fine between my Lync client and my Skype client which uses a Microsoft account in the format of "me(domain.com)@msn.com".

      1. Thank for answer, will wait for good changes.
        Yes, I'm using such format "me(domain.com)@msn.com", but it works not with all recipients correctly.

  2. Thanks for sharing this knowledge Jeff
    The Skype guys are weighing in this department of áudio codecs, they really are top-notch!

  3. I am having trouble connecting a client Lync 2013 with an HDX, do you connect video. However, with Lync 2010 client connects normally. How can I change the resolution of the Lync client 2013?

    1. You need to have the 'RTV Options Key' purchased and installed on the HDX in order to support video calls with Lync 2013 clients.

  4. Hi Jeff – all.

    Great article! A minor correction. packetization-mode=1 for H.264 SVC refers to the non-interleaved packetization mode defined in RFC 6184 (RTP payload format for H.264) which is also used in RFC 6190 (RTP payload format for SVC). See Section 6.3 of RFC 6184 (https://tools.ietf.org/html/rfc6184#section-6.3).

    UCConfig Mode set to 1 refers to the fact that Lync uses only temporal scalability, with an encoder configuration that follows the UCI Forum's AVC and SVC Modes specification. (BTW, the UCI Forum has merged with IMTC this summer — see http://www.imtc.org).

    1. Thanks for the clarification. I had thought that the temporal scaling limitation was reflected there in SDP, but it must be buried in the video source requests; I have yet to dig into those in an article.

  5. Hello,

    is it possible to remove some audio codecs from supported codec list. I need to remove RED codec from lync server 2013, is it possible?
    Thank you.

  6. Hello Jeff,

    I would like to know, if there are any way to disable FEC in the Lync Server Audio with G.711…

    Many thanks!

    Regards,

    Thiago Mendes

  7. I have problem with communication Lync to group 500. Error message is:
    04/28/2015|12:09:11.658 15DC:F7C INFO :: Data Received -10.215.239.14:5061 (To Local Address: 10.140.67.182:60744) 1060 bytes:

    04/28/2015|12:09:11.658 15DC:F7C INFO :: SIP/2.0 415 Unsupported Media Type

    Authentication-Info: TLS-DSK qop=”auth”, opaque=”F484CB78″, srand=”DC74F0B1″, snum=”64″, rspauth=”6551768ac0dd537b0bead3107fdd2f286a482ee0″, targetname=”CZMSPLW008.csob.int”, realm=”SIP Communications Service”, version=4

    Via: SIP/2.0/TLS 10.140.67.182:60744;ms-received-port=60744;ms-received-cid=4790100

    Content-Length: 0

    Contact: mlovich ;proxy=replace;+sip.instance=””

    From: “HOBL Martin” ;tag=2ba6550bad;epid=b14dc840cf

    To: ;tag=plcm_4240129991-811;epid=8213030FD1A5CV

    Call-ID: 36d901321b3c4374be9b39bcabf15e78

    CSeq: 1 INVITE

    Supported: timer

    Accept: application/ms-conf-invite+xml,application/sdp

    Accept-Encoding: identity

    Accept-Language: en

    ms-diagnostics: 1037;reason=”Previous hop client did not report diagnostic information”;Domain=”csob.sk”;PeerServer=”10.194.179.8″;source=”CZMSPLW007.csob.int”

    04/28/2015|12:09:11.658 15DC:F7C INFO :: End of Data Received -10.215.239.14:5061 (To Local Address: 10.140.67.182:60744) 1060 bytes

    Do you meet with this? Do you know where is the problem?

    Robo

  8. We’ve been having intermittent audio delay on our mostly CX600.

    This is just for outbound calls where the callee cannot hear the caller for the first 3 seconds. The caller can hear the callee. When we enable media bypass it is worse for the impacted users.

    We’ve opened a ticket with our SBC vendor & we did some packet captures. Analysis shows that during the time there was no audio heard by the callee, the CX600 was not sending media even if the caller was already speaking & callee media was being sent from the SBC to the phone (the caller is able to hear the callee).

    The issue might not be specific to the CX600 but the user reported cases are all CX600 & these are what is mostly deployed in our organization.

    It also seems to happen to some people only.

    Have you encountered this & do you have any advice for us?

  9. Hi There,

    I’m slightly confused by this article. You state that the G722 audio codec “is primarily used in Lync conference calls when no other Lync clients are participating in the same call” and yet that codec is the most preferred in the m=audio line from the log you show. When I have a look at logs in my own environment I see that the preferences on my clients match those in your example also. Nearly all conference calls in our environment are Lync Client to Lync client so why would they prefer a codec that is primarily used to communicate with non-Lync clients?

    Cheers
    Craig

  10. is there a way to disable some codecs that lync uses? i am trying to connect vtc to lync meetings but it is stalling on lz77-64k

  11. Dear Jeff,
    i’ve just setup a Skype for Business 2015 server and create a SIP trunk with Hipath 4000 PBX. Everything look fine but only when i call from Skype client to Analog phone.For each call, the analog phone only hear the caller from Skype in 10s, after that analog phone can not hear anything from skupe client, the skype client can hear from analog phone very well. Both of them running on Local network.
    We also use Wireshark to capture the packets and see that: Hipath and Skype use RTP (G711A) Codec from beginning but after few second Skype side send RTP (RED), and at thí point analog can not hear anymore. I dont know why Skype side used RTP (RED) instead of G711A suddenly.
    Sorry about my bad english, and could u help me to expalin this case?

    Thanks,

  12. Hear the transaction that i can capture
    INVITE SDP (RED telephone event CN g711U g711A)
    51914 ————————————————————————–>5060 : SIP Invite From: abc@12345.com

    180 ringing SDP (RED Telephone event g711A)
    51914 <————————————————————————– 5060 : SIP Status 180 Ringing

    RTP (g711A)
    55866 5060 : SIP PRACK From: abc@12345.com

    200 OK
    51914 <————————————————————————– 5060 : SIP Status 200 OK

    200 OK
    51914 5060 : SIP Request Invite ACK 200

    RTP (g711A)
    55866 ————————————————————————–> 29100: RTp 585 packets Duration: 11,374s

    RTP (RED)
    55866 ————————————————————————–> 29100: RTp 145 packets Duration: 2,282s (Right here analog phone can not hear anything)

    RTP (g711A)
    55866 ————————————————————————–> 29100: RTp 1 packets Duration: 0s

    RTP (RED)
    55866 ————————————————————————–> 29100: RTp 67packets Duration: 1.3s

    BYE
    5060 1034: SIP Status 200 OK

    Thanks

Leave a Reply to TR NGuyen Cancel reply

Your email address will not be published. Required fields are marked *