Lync Edge STUN versus TURN

October 15, 2012 by · 51 Comments 

The primary purpose of this article is to help explain the difference in the available media paths provided by the Interactive Connectivity Establishment (ICE) protocol.  These solutions provided by the implementation of ICE in Lync Server applies to both UDP and TCP transmission of data so audio, video and desktop sharing sessions can all take advantage of ICE.

In Lync 2010 nearly every client or server can act as an ICE Client (e.g. Windows client, Front-End AVMCU, Mediation Server, etc) and utilize the Edge Server to negotiate media sessions between each other.  Only the Edge Server role is defined as an ICE Server and is capable of providing the information needed by ICE clients to initialize these connections.

Media Traversal Basics

ICE provides two protocol-level solutions that nearly every Lync client and server role can leverage to find some available path to establish media between each other.  These two protocols are commonly referred to together and the difference between them is not widely understood.  Although Lync nearly always finds a way to connect media, when it does not work it can be very beneficial to know what one is looking for even when performing basic troubleshooting steps.

  • Session Traversal Utilities for NAT (STUN) – This protocol basically allows an ICE client which is located behind a firewall providing Network Address Translation to discover the public IP address as well as identify the type of NAT in use and then provide that IP to the other party as a potential candidate to send media to.  This IP would be assigned to the Internet-facing side of the NAT device which the client is located behind.

Anecdotally the acronym STUN may also be seen referred to as Simple Traversal of UDP through NAT which was the protocol’s original name (as defined in now obsolete RFC 3489).  When the protocol was updated to include support for TCP the name was changed to Session Traversal Utilities for NAT to reflect that it was no longer limited to UDP traffic and retained the same well-known acronym.

  • Traversal Using Relays around NAT (TURN) – This protocol allows a dedicated ICE server to provide its own public IP address as a media candidate to one or both parties in a call and will act is a relay or proxy for the media session.  This IP would always be the Internet-facing public IP address (either assigned directly to the server interface or assigned to an external NAT-device). 

A media relay server or ‘ICE’ server (e.g. the Lync Edge Server) is utilized to setup the media session and provide the list of potential candidates to both parties in a call regardless of which media delivery option is selected for each leg of the call.  The key difference between these two types of solutions though is that media will travel directly between both endpoints if STUN is used, whereas media will be proxied through the Edge Server if TURN is utilized.  Also understand that the media stream may not always use the same solution on both legs as STUN may be possible for one endpoint but not for the other endpoint.

In any Lync media session used to provide audio, video, desktop sharing, or a combination of each there are three potential routes that media can possibly travel between two endpoints.  Regardless of whether the call is a two-party call or a multi-party conference call there are still only two endpoints in each ‘leg’ of a conversation (either client-to-client in a peer session or client-to-server in a conference).  So when two endpoints attempt to establish a media session then a list of potential candidates is sent by each endpoint to the other.  If no Edge Server is deployed or available then the list will only include local candidates, otherwise when ICE is available then additional candidates will be provided.

  • Host or Local Candidate – The actual IP address bound directly to the remote client’s host operating system.  This could include multiple candidates as the remote host could contain multiple physical or virtual network adapters including any active VPN clients.  Most often this will be a single IP address of the active interface on a Lync client’s workstation.

  • Reflexive or STUN Candidate – The public IP address assigned to the client’s immediate firewall perform network address translation.  In most home networks this would be the public IP addresses assigned by an ISP (either dynamically or statically) to the premises modem or router, depending on the type of service.

  • Relay or TURN Candidate – The publically accessible IP address assigned to the media relay server which is allocated to the client.  In Lync Server this is the public IP address assigned either directly to the external A/V Edge interface or the public IP address allocated to a NAT device (e.g. firewall) which is performing static network address translation to a private IP address assigned directly to the Edge Server.  In the event that an Edge Pool is deployed then this would be the IP address of one of the individual servers in the pool. 

At no time during session establishment is the calling endpoint ever aware of the actual location of the remote host, so it will rely on a list of candidate addresses provided in the signaling discussion to attempt to find a target to send the media.  The candidates provided are a list of  IP addresses, ports, and protocols which are gathered from information the endpoint knows about (the host’s own IP addresses) and details that the Edge Server uses and can discover (firewall and server addresses).  The client will select the best option in the event that multiple paths are available, preferring more direct paths over relay paths, and preferring UDP sessions over TCP (for audio/video calls; desktop sharing and file transfer is limited to TCP).

But in order for Lync to establish a media connection over any of these paths the client must first validate the list of candidates by opening connections to any and all entries in the list simultaneously.  Although there is a preferred order to the candidate list the clients will attempt connections to all at the same time, as checking them in order one at a time would be counterproductive to quickly establishing   This phenomenon will often confuse the first-time troubleshooter if client traffic is captured on a basic peer-to-peer call during media establishment where both endpoints are on the same network and no assistance from the Edge Server would be necessary .  Within these traces will often be connection attempts to every single IP address provided in the candidate list, including the relay and reflexive candidates.  This behavior is due to Lync Server’s inclusion of support for Early Media (SIP 183 Session Progress) messages which allow Lync to establish a media connection before the called endpoint even answers the call.  This is done to eliminate or reduce any delays which might be caused by waiting until after the call is established, thus insuring the users can hear each other immediately in an audio call.

It should also be understood that sometimes the first candidate to respond for the early media connection may not be the best candidate for the duration of the call.  Thus Lync also supports a mechanism referred to in Lync as Candidate Promotion (via a SIP Re-INVITE method) which provides to change the media path during a session without dropping media or signaling.  In practice though this scenario is typical rare as usually the first candidate to respond is the best (and often only) option.

Media Establishment Scenarios

In the following scenarios a single Lync client will attempt to establish some type of media sessions with another Lync client; it could be an audio call, a video call, or even simply a desktop sharing session.  These scenarios will highlight the differences between local, reflexive (STUN), and relay (TURN) sessions.

External Peer-to-Peer Scenario

The following diagram depicts an external Lync user located in a home network playing a call to another Lync user in the same organization which also happens to be located in a different home network.  This is a very common use case with today’s distributed work force where at any given time a user may be located in a home office or some other public Internet access point like coffee shops or hotels.

image

  1. This highest preferred candidate option shown in red to the local host IP address will fail.  The most preferred candidate is always a local candidate and is the reason that peer media sessions between clients on the same network will never use the Edge server as the direct method would need to fail first, triggering a fallback to the other available options.  But in this scenario the two clients are on different networks utilizing unroutable private subnets and there is no way that either host can connect to each other directly. 

  2. The next preferred option is to use the server reflexive candidate which is provided by the Edge Server using STUN.  In many cases this connection will be allowed by the firewall on the called client’s network.  How this actually works will be discussed in the next section, but this option most commonly works with consumer-grade firewalls which can in essence be ‘fooled’ by the Edge Server into accepting the inbound media connection directly from the calling client’s firewall.  (This is not a security risk or any type of hack, which will also be discussed in more detail later.)

  3. In the event that STUN fails then the final option is to utilize the Edge Server as a media relay.  The calling client will establish a media session directly with the A/V Edge Server as will the receiving client.  The inbound media stream relayed by the Edge Server will be accepted by the called client’s firewall.  (Also note the deliberate ‘loop’ depicted in the diagram to illustrate that the Edge Server will ‘turn’ media around to the other client.  This visual mnemonic is meant to help remember which media path is TURN and which is STUN.)

Although media leveraging STUN is not a direct host-to-host session it is the next best option as the media path is still sent directly between the two client’s own firewalls, over the Internet.  This keeps the media session as short as possible and does not burden the corporate network with handling the media relay processing or bandwidth.  At this point it should be quite clear why using the media relay for all sessions could impact the scalability of the overall solution as the Edge Server would need to handle a large amount of media session and the Internet connectivity of the site where the Edge Server is located would need to support all those media sessions inbound and outbound.  Thus the immediate value of STUN is evident in this simple diagram.

Mixed Peer-to-Peer Scenario

In this scenario only of the Lync clients is located in a home network and is attempting to call another Lync user which happens to be located on the internal corporate network of the same Lync organization.

image

  1. Just as in the previous scenario the direct approach will fail because the external client will not be able to locate the unroutable, local IP address of the other client.

  2. The server reflexive candidate for the internal Lync client is the public IP of the corporate firewall, but this happens to be an enterprise-class firewall which is usually too intelligent to allow STUN traffic to be established to protected internal hosts, thus causing the calling client’s candidate availability check to fail.  Lync will then move on to the next and final option in the list.

  3. As all other candidates were unreachable then Lync will fallback to using the Edge Server to provide media relay duties.

In this scenario the success of STUN is less of a possibility as the enterprise-class firewall is option configured to utilize Symmetric NAT (SNAT) which will contains a mapping table of reserved IP address and port combinations typically allowing for external inbound connections to be routed directly to an internal services.  As this is a client-to-server scenario commonly used for accessing web servers or other status services it would not be feasible to manage individual rules for every potential ICE client inside the network, especially when most of these clients are typically located on a dynamic IP address.  Thus TURN would most likely be utilized in this scenario as the Edge Server is static and always available for both parties to connect and send media to.

Because the called endpoint is already located on the internal network then the media traffic must still traverse the corporate Internet connection and firewalls anyway, so even if STUN was possible the network still must deal with the media traffic.  But leveraging STUN over TURN in this scenario would still provide on advantage as the Edge Server would not be burdened with relaying the media session itself.

Client-to-Server Scenario

For clarification purposes the following scenario contains a mixture of different client and server roles.  In the event that the external client is connecting to a Lync multi-party conference call then it will send its media stream to the Front End server where the meeting is instantiated.

image

In this scenario the following statements the overall media session between endpoints looks identical to the earlier peer-to-peer call between external and internal Lync clients.  The internal Lync Server behaves the same as an internal Lync client; they are both ICE clients.  The Edge Server is an ICE Server as it facilitates the connection between endpoints by the beginning by providing candidate information through to the end by handling the media session for the duration of the call.

An important distinction worth noting is that internally connected Lync clients will ignore the TURN (or server relay) candidate provided by the ICE server.  This IP address is used only by external (e.g. foreign) endpoints like federated Lync clients.  A Lync client will be made aware of its assigned media relay server via in-band provisioning details provided during initial registration and sign-in to Lync and it will always connect to this FQDN for media relay functionality.

(Thanks to Thomas Binder from Microsoft for offering some feedback based on his TechNet Europe presentation on Edge Media Connectivity with ICE.)

About Jeff Schertz
Site Administrator

Comments

51 Responses to “Lync Edge STUN versus TURN”
  1. John says:

    Great article on a complex topic. Can you please add some network capture examples and sip traces as well?

    • jeffschertz says:

      John, I have another article I'm working on which will include that level of detail. I meant to keep this one on the shorter side to focus on the difference between the concepts of STUN and TURN. Keep an eye out for the follow-up article sometime this quarter.

      • LiMeng says:

        May I know whether the second article on this topic is ready now?
        another question, How the Host know direct media failed so try STUN and if STUN failed try TURN? the Host do some Ping test? (also does not make sense if the FW in between block ICMP paket).
        Yet another question, so all the candidate ip addresses are returned to client from Edge server? (as you said, only Edge server can be ICE server).
        thanks!

        • jeffschertz says:

          The two ICE clients will perform a series of STUN binding requests in which they each attempt to connect to every IP:port entires on their candidates lists to see which they can connect to. The first response from these tests will be used to establish early media in most cases, and in some rare cases that a slower, yet more efficient path responds then the media will be automatically moved to that destination. It's not ICMP traffic but an actual media connection. the candidate list is built from IPs that the host itself is aware of (local IP) and IPs that the ICE server supplies (the relay and reflexive entries that the Edge server discovers).

  2. Jonathan T. says:

    This is helpful. Thanks for taking the time to do it right. Any thoughts on the Lync-Skype feature in the new release?

  3. Ashwin says:

    This is a great article. Helps a lot to understand ICE candidate negotiation for different call scenarios.

  4. bueschu says:

    Thank's for this very interesting article about an complex topic.

  5. Chuck B says:

    Hi Jeff, Great article. Thanks for taking time to clarify these aspects of Ice, Stun, and Turn.

  6. Anjana says:

    Very good article !!.

  7. UC Ali says:

    Precise and informative article for clearing concepts on STUN and TURN!

  8. Jason Santamaria says:

    Great article. As a corporate firewall administrator (among other things), this explains LYNC communications far better than anything else I have found. However, it seems that our edge server is trying to TURN using the Host Candidate address, and of course, failing. any Ideas?

    • jeffschertz says:

      How are you determining this? I've not seen that before and struggle to imagine how an Edge misconfiguration might cause that behavior. I'd be curious to know if you find a resolution.

  9. Rich says:

    Jeff great article. This and Thomas Binder Deep dive video are very helpful!
    Slightly unrelated question. In my organisation we have no EDGE deployed yet, but I still see an mrasURI response in the Event: vnd-microsoft-provisioning-v2 when looking at a snooper trace log. Why would that be?

  10. Tony L says:

    Mr. Jeff,

    Thanks for explaining this so well. I have an issue, though, that's driving me batty – I have certain Lync clients that only show private IPs as candidates. It is not a firewall issue, since PCs within the same environment (not the same AD environment) do list public STUN and TURN IPs. Is there anything that you know in Group Policy that could constrain the candidate results?

    Any suggestions would be more than welcome.

  11. jiang says:

    Great article. I have a question on federated users. I have two group of users, One group is students and the other is staff. The students using Lync Online, staff using on-premise lync. They are federated. If both staff and student located in the same campus network, the student initiate a call to staff. What's the media path between student and staff? Is it direct peer to peer connection? or the media traffic must go to lync online server and come back again? Thanks.

    • jeffschertz says:

      The media path for federated users isn't any different, a peer-to-peer connection will always be attempted first, with STUN and TURN as fall back options. Media can traverse both Edge servers or only one depending on client locations and network topologies.

  12. Ahmed says:

    Hi Jeff, is to possible to have a VVX1500 version 4.0.3F registered with Lync 2010 to call a Lync client registered from the internet. currently the call keep connecting and no media works. also TMG shows the private IP of the Lync client.

    • jeffschertz says:

      No, the 4.0.x release does not support ICE/STUN/TURN so it will be unable to establish media across firewalls or NAT. You'll need the qualified 4.1 firmware for that, but Polycom has not yet released a 4.1 update for the VVX1500.

  13. Vaibhav Dalvi says:

    Overall its great artice which eplains in depth !!!

  14. peakers1976 says:

    This is a great article Jeff.

    I really appreciate the in depth explanation on how the media paths are established on a Lync call.

  15. Melek Attia says:

    Great!

  16. Ian Arakel says:

    Hi Team,

    We have an issue in our setup.

    We are from the firewall admin team and in close sync with the Exchange team to fix the issue.

    Below is a summary of the issue:

    1.
    We are trying to initiate desktop sharing from Public internet to a user within our corporate setup.

    2.
    The observation is that the desktop sharing fails with an error "Sharing failed due to network issues, please try later".
    The communication right now is between the external server (IP: A.B.C.D,Private IP:192.168.4.x) to the Lync server in the DMZ range (Public IP: E.F.G.H,Private IP 172.31.32.X)

    3.
    The firewall logs suggest that the return traffic from the edge server 172.31.32.x was hitting the private IP 192.168.4.x of the external server which should not be the case since the communication should happen with its public IP(A.B.C.D).
    Please note 192.168.x.x is our internal zone subnet of our corporate setup.

    We referred the below link: http://social.technet.microsoft.com/Forums/lync/e

    It states the dependency of the same on STUN/TURN process.

    Could someone please help us out here????

    • jeffschertz says:

      Take a look at the candidate list in SDP on the remote client to make sure that the NATed public IP of the Edge server shows there. If the internal IP is seen then the Lync Topology may be incorrect and the public IPs are not defined in the Edge pool correctly.

  17. Guest says:

    Great article Jeff.
    The last paragraph is the gem for me…could not work out why TURN was not working between two independent offices. That last comment explained the results perfectly.

  18. Mohit says:

    Hello Jeff,

    Great Article!! I will really appreciate if you can be kind enough to throw some light on the following Media Establishment scenario:

    Extending the above mentioned Peer-to-Peer Scenario to a 2 Edge server lync topology where both the Peers use a different Edge server as the relay. In this case I assume there will be media establishment between the 2 Edge servers (Peer A <–> Edge A <–> Edge B <–> Peer B). In this case how does the media traverse between the two Edge servers. From the Edge's perspective it looks like the media is being hairpinned to the other Edge Server.

  19. Doug says:

    Good morning Jeff,

    Thanks for the articles.. they have been super helpful understanding what is going on behind the scenes. We've recently deployed Lync 2013 and I've noticed an oddity that I can't explain and I wonder if it falls within the media paths hierarchy

    We have Lync configured for most roles (minus enterprise voice). When an external client using LWA in this case is in a conference with an internal user, I have noticed that when desktop sharing or video starts, I will see drops in our firewall logs. In this case it shows the Edge server external nic ip trying to either hit our front end servers ip on 3478 STUN or even the occasional hit to the internal client on 3478 STUN. The media connection will eventually work (though occasionally it has to retry). I thought that this must be a routing issue on the Edge servers, but the local routing tables look correct.

    IIs this normal behavior in the hiearchy of media paths? I keep going back to your mixed Peer to Peer scenario. Will Lync will try to go as direct as possible, including bypassing the internal nic of edge when it tries direct, and keep falling back until it finds a path? It doesn't seem correct, but running a Network Monitor capture on the Edge server and reproducing the issue shows MediaRelaySvc traffice sourced from the external nic ip destined for the front end server ip. Unless I'm misinterpreting the captures, something seems amiss. Thanks!!

    • Jeff Schertz says:

      Lync will always attempt direct media before falling back to reflexive or relay assistance. Clients will use the internal NIC of their assigned Edge server by design they should never be going to the external Edge IP when located on an internal network.

      • Doug says:

        How does LWA fit into the picture with connectivity? LWA runs on the front end servers fronted through reverse proxy as I understand it. If I have an external user using Lync Web App from our environment connecting to an internal user using the full lync client.. which path is taken for im/presence? Once the internal user initiates a desktop sharing session, where does the communication path flow? Thanks for you help as I try to wrap my head around this better.

        • Jeff Schertz says:

          Media is always handled by the Edge server regardless of the client's signaling path or location. In this scenario the media path is client (web browser) to server (AVMCU) so TURN will typically be used for external web clients to send/receive media with the Lync Front End server running this meeting.

  20. James says:

    Hi Jeff, I am implementing Split tunneling in an environment with no Client firewall. I have used the Network Firewall to Block the Lync Network, but i have an issue with Peer to Peer, i need the peer to peer to go over the Edge, and not STUN to the internal network. will you advise I block STUN over the internal network? What will be your best solution.

  21. Ryan says:

    HI Jeff,
    Similar question to James above.

    We're trying to get split tunneling enabled. We have QOS enabled as well and thus we have defined Audio/Tideo/…. with the set-csconferencingconfiguration command.

    Lets say the audio port ranges are 50020-50040 and as such the local windows 7 firewall rule states while on VPN ranges, block inbound attempts from all IP's via UDP 50020-50040 destined for 50020-50040 on my local machine.

    Essentially here's what happens. When an internal user tries to do a lync call with a user teleworking (thus on VPN) the call will start on the Edge server but then somehow STUN kicks in and i see the VPN IP address of the teleworking user talking directly to the IP of the internal user over the Audio Ports. Running wireshark i can see this, the call flows over edge to begin then i see STUN and after STUN does it thing, the two end points are now talking directly with each other. How does STUN do this?

    • Jeff Schertz says:

      It sounds like you are seeing the candidate promotion capability of Lync where a better media path, found after the initial early media connection, is used. As STUN is a more efficient path than TURN the client is 'promoting' the connection by establishing a second, short media path and them moving to it. Although I would argue that the VPN path is not more efficient, Lync is not programmed to understand that and just sees 'shorter as better'. As far as how STUN does that, I have no idea; the inner workings of ICE is mystery,even to me :) In practice I've found it incredibly difficult to block all media over VPN as ICE is so good at its job of finding a better (aka shorter) route.

    • Jeff Schertz says:

      That’s candidate promotion in action. As far as how STUN actually works that is a mystery to most including me.

  22. Francesc says:

    Hi Jeff,
    Great article. It has been very helpful for me to understand STUN, TURN and ICE much better.
    Thank you very much

  23. refikunver says:

    Hi Jeff,
    Excellent article. Would be nice if there is an example between federated partners too.

    A question: There are two edge servers for two different Lync organization in the SAME dmz with the same internal IP subnet. If one of the lync clients connects from home and the other federated client in corporate network, in this case TURN tries to connect via Public IP of Edge servers' AV interfaces. But since Edge servers are in the same dmz with internal IPs, they can not routed! Is there a way for edge servers to communicate via internal IP of AV interfaces? (Writing internal AV edge IPs to hosts file of Edge servers does not work. I think the reason is: public AV edge IPs are exchanged during the session establishment?)

    thank you.

    • Jeff Schertz says:

      The traffic between the Edge Server must be via the external interfaces only, they cannot talk to each other if the traffic is be routed out their internal interfaces.

  24. Francois says:

    Hello,

    I am working on a Lync 2013 deployment where most users and phones are connecting externally through the edge server.

    I tried to deploy a second edge server load balanced by ZENLoadbalancer.
    It generally works good on pc clients and smartphone application and also on soundstation duo 4.1.0 but the VVX 300 and 400 (tried all firmware from 4.1.8 to 5.2) I want to use are working around half of the tries: sometimes the call connect but there is no sound and the counter of time is not displaying.

    I managed to focus on the problem, in the situations when the call is without sound, it appears that the phone is trying to contact directly the mediation server on its private IP. I could see however that the phone received correctly several routes, including a good one using relay, public ip of one edge server and private ip of the mediation.
    Why would the phone choose the wrong candidate?

    Do you have some idea where the problem could come from?

  25. Vincent C says:

    Good write up, I feel like I almost understand the media paths now, the big question for me is can I influence this behaviour and prevent the clients from using host candidates and thus force a more complex media path via the edge. I tried intercepting and messing the the SIP/SDP a=candidate, removing the host candidate entries but to no avail.

    • Jeff Schertz says:

      You cannot control this behavior directly. Only by preventing routes between candidates, and causing the connection tests to fail can you influence which candidates are ultimately used for the sessions. Modifying the SDP would be quite unsupported in Lync and could cause any sort of unknown problems.

Trackbacks

Check out what others are saying about this post...
  1. [...] Lync Edge Stun Turn By Lync MVP Jeff Schertz – [...]

  2. [...] STUN and TURN, to get media to endpoints on the other side of a NAT firewall. Jeff Schertz has a fantastic and very detailed article about Lync’s usage of these [...]

  3. […] Jeff Schertz article on STUN, TURN and ICE is helpful in showing graphically what is going on. The pattern I see is this 1.) There are three candidate paths no matter the call: Host, Stun, and Turn. 2.) The host path is only valid when both clients are internal or routable. 3.) The Stun Path is the public (reflexive) IP on both users as the Media traversal. 3. The Turn Path, takes the STUN addresses and acts a relay because the STUN path will not connect on its own. […]

  4. […] for remote clients which goes into detail on TURN and STUN, see the article by Jeff Schertz Lync Edge STUN versus TURN. The failing application sharing session would only list the private 192.168.0.x address, not the […]

  5. […] codecs in there scenarios.  And because video is handled the same way as audio in terms of media traversal in Lync then technically video calls may function in most scenarios.  But understand that […]

  6. […] For a detailed explanation of the process I will refer to Jeff Schertz with his great "STUN vs TURN" blog […]



Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!