Lync Edge STUN versus TURN
October 15, 2012 by Jeff Schertz · 29 Comments
The primary purpose of this article is to help explain the difference in the available media paths provided by the Interactive Connectivity Establishment (ICE) protocol. These solutions provided by the implementation of ICE in Lync Server applies to both UDP and TCP transmission of data so audio, video and desktop sharing sessions can all take advantage of ICE.
In Lync 2010 nearly every client or server can act as an ICE Client (e.g. Windows client, Front-End AVMCU, Mediation Server, etc) and utilize the Edge Server to negotiate media sessions between each other. Only the Edge Server role is defined as an ICE Server and is capable of providing the information needed by ICE clients to initialize these connections.
Media Traversal Basics
ICE provides two protocol-level solutions that nearly every Lync client and server role can leverage to find some available path to establish media between each other. These two protocols are commonly referred to together and the difference between them is not widely understood. Although Lync nearly always finds a way to connect media, when it does not work it can be very beneficial to know what one is looking for even when performing basic troubleshooting steps.
- Session Traversal Utilities for NAT (STUN) – This protocol basically allows an ICE client which is located behind a firewall providing Network Address Translation to discover the public IP address as well as identify the type of NAT in use and then provide that IP to the other party as a potential candidate to send media to. This IP would be assigned to the Internet-facing side of the NAT device which the client is located behind.
Anecdotally the acronym STUN may also be seen referred to as Simple Traversal of UDP through NAT which was the protocol’s original name (as defined in now obsolete RFC 3489). When the protocol was updated to include support for TCP the name was changed to Session Traversal Utilities for NAT to reflect that it was no longer limited to UDP traffic and retained the same well-known acronym.
- Traversal Using Relays around NAT (TURN) – This protocol allows a dedicated ICE server to provide its own public IP address as a media candidate to one or both parties in a call and will act is a relay or proxy for the media session. This IP would always be the Internet-facing public IP address (either assigned directly to the server interface or assigned to an external NAT-device).
A media relay server or ‘ICE’ server (e.g. the Lync Edge Server) is utilized to setup the media session and provide the list of potential candidates to both parties in a call regardless of which media delivery option is selected for each leg of the call. The key difference between these two types of solutions though is that media will travel directly between both endpoints if STUN is used, whereas media will be proxied through the Edge Server if TURN is utilized. Also understand that the media stream may not always use the same solution on both legs as STUN may be possible for one endpoint but not for the other endpoint.
In any Lync media session used to provide audio, video, desktop sharing, or a combination of each there are three potential routes that media can possibly travel between two endpoints. Regardless of whether the call is a two-party call or a multi-party conference call there are still only two endpoints in each ‘leg’ of a conversation (either client-to-client in a peer session or client-to-server in a conference). So when two endpoints attempt to establish a media session then a list of potential candidates is sent by each endpoint to the other. If no Edge Server is deployed or available then the list will only include local candidates, otherwise when ICE is available then additional candidates will be provided.
- Host or Local Candidate – The actual IP address bound directly to the remote client’s host operating system. This could include multiple candidates as the remote host could contain multiple physical or virtual network adapters including any active VPN clients. Most often this will be a single IP address of the active interface on a Lync client’s workstation.
- Reflexive or STUN Candidate – The public IP address assigned to the client’s immediate firewall perform network address translation. In most home networks this would be the public IP addresses assigned by an ISP (either dynamically or statically) to the premises modem or router, depending on the type of service.
- Relay or TURN Candidate – The publically accessible IP address assigned to the media relay server which is allocated to the client. In Lync Server this is the public IP address assigned either directly to the external A/V Edge interface or the public IP address allocated to a NAT device (e.g. firewall) which is performing static network address translation to a private IP address assigned directly to the Edge Server. In the event that an Edge Pool is deployed then this would be the IP address of one of the individual servers in the pool.
At no time during session establishment is the calling endpoint ever aware of the actual location of the remote host, so it will rely on a list of candidate addresses provided in the signaling discussion to attempt to find a target to send the media. The candidates provided are a list of IP addresses, ports, and protocols which are gathered from information the endpoint knows about (the host’s own IP addresses) and details that the Edge Server uses and can discover (firewall and server addresses). The client will select the best option in the event that multiple paths are available, preferring more direct paths over relay paths, and preferring UDP sessions over TCP (for audio/video calls; desktop sharing and file transfer is limited to TCP).
But in order for Lync to establish a media connection over any of these paths the client must first validate the list of candidates by opening connections to any and all entries in the list simultaneously. Although there is a preferred order to the candidate list the clients will attempt connections to all at the same time, as checking them in order one at a time would be counterproductive to quickly establishing This phenomenon will often confuse the first-time troubleshooter if client traffic is captured on a basic peer-to-peer call during media establishment where both endpoints are on the same network and no assistance from the Edge Server would be necessary . Within these traces will often be connection attempts to every single IP address provided in the candidate list, including the relay and reflexive candidates. This behavior is due to Lync Server’s inclusion of support for Early Media (SIP 183 Session Progress) messages which allow Lync to establish a media connection before the called endpoint even answers the call. This is done to eliminate or reduce any delays which might be caused by waiting until after the call is established, thus insuring the users can hear each other immediately in an audio call.
It should also be understood that sometimes the first candidate to respond for the early media connection may not be the best candidate for the duration of the call. Thus Lync also supports a mechanism referred to in Lync as Candidate Promotion (via a SIP Re-INVITE method) which provides to change the media path during a session without dropping media or signaling. In practice though this scenario is typical rare as usually the first candidate to respond is the best (and often only) option.
Media Establishment Scenarios
In the following scenarios a single Lync client will attempt to establish some type of media sessions with another Lync client; it could be an audio call, a video call, or even simply a desktop sharing session. These scenarios will highlight the differences between local, reflexive (STUN), and relay (TURN) sessions.
External Peer-to-Peer Scenario
The following diagram depicts an external Lync user located in a home network playing a call to another Lync user in the same organization which also happens to be located in a different home network. This is a very common use case with today’s distributed work force where at any given time a user may be located in a home office or some other public Internet access point like coffee shops or hotels.
- This highest preferred candidate option shown in red to the local host IP address will fail. The most preferred candidate is always a local candidate and is the reason that peer media sessions between clients on the same network will never use the Edge server as the direct method would need to fail first, triggering a fallback to the other available options. But in this scenario the two clients are on different networks utilizing unroutable private subnets and there is no way that either host can connect to each other directly.
- The next preferred option is to use the server reflexive candidate which is provided by the Edge Server using STUN. In many cases this connection will be allowed by the firewall on the called client’s network. How this actually works will be discussed in the next section, but this option most commonly works with consumer-grade firewalls which can in essence be ‘fooled’ by the Edge Server into accepting the inbound media connection directly from the calling client’s firewall. (This is not a security risk or any type of hack, which will also be discussed in more detail later.)
- In the event that STUN fails then the final option is to utilize the Edge Server as a media relay. The calling client will establish a media session directly with the A/V Edge Server as will the receiving client. The inbound media stream relayed by the Edge Server will be accepted by the called client’s firewall. (Also note the deliberate ‘loop’ depicted in the diagram to illustrate that the Edge Server will ‘turn’ media around to the other client. This visual mnemonic is meant to help remember which media path is TURN and which is STUN.)
Although media leveraging STUN is not a direct host-to-host session it is the next best option as the media path is still sent directly between the two client’s own firewalls, over the Internet. This keeps the media session as short as possible and does not burden the corporate network with handling the media relay processing or bandwidth. At this point it should be quite clear why using the media relay for all sessions could impact the scalability of the overall solution as the Edge Server would need to handle a large amount of media session and the Internet connectivity of the site where the Edge Server is located would need to support all those media sessions inbound and outbound. Thus the immediate value of STUN is evident in this simple diagram.
Mixed Peer-to-Peer Scenario
In this scenario only of the Lync clients is located in a home network and is attempting to call another Lync user which happens to be located on the internal corporate network of the same Lync organization.
- Just as in the previous scenario the direct approach will fail because the external client will not be able to locate the unroutable, local IP address of the other client.
- The server reflexive candidate for the internal Lync client is the public IP of the corporate firewall, but this happens to be an enterprise-class firewall which is usually too intelligent to allow STUN traffic to be established to protected internal hosts, thus causing the calling client’s candidate availability check to fail. Lync will then move on to the next and final option in the list.
- As all other candidates were unreachable then Lync will fallback to using the Edge Server to provide media relay duties.
In this scenario the success of STUN is less of a possibility as the enterprise-class firewall is option configured to utilize Symmetric NAT (SNAT) which will contains a mapping table of reserved IP address and port combinations typically allowing for external inbound connections to be routed directly to an internal services. As this is a client-to-server scenario commonly used for accessing web servers or other status services it would not be feasible to manage individual rules for every potential ICE client inside the network, especially when most of these clients are typically located on a dynamic IP address. Thus TURN would most likely be utilized in this scenario as the Edge Server is static and always available for both parties to connect and send media to.
Because the called endpoint is already located on the internal network then the media traffic must still traverse the corporate Internet connection and firewalls anyway, so even if STUN was possible the network still must deal with the media traffic. But leveraging STUN over TURN in this scenario would still provide on advantage as the Edge Server would not be burdened with relaying the media session itself.
For clarification purposes the following scenario contains a mixture of different client and server roles. In the event that the external client is connecting to a Lync multi-party conference call then it will send its media stream to the Front End server where the meeting is instantiated.
In this scenario the following statements the overall media session between endpoints looks identical to the earlier peer-to-peer call between external and internal Lync clients. The internal Lync Server behaves the same as an internal Lync client; they are both ICE clients. The Edge Server is an ICE Server as it facilitates the connection between endpoints by the beginning by providing candidate information through to the end by handling the media session for the duration of the call.
An important distinction worth noting is that internally connected Lync clients will ignore the TURN (or server relay) candidate provided by the ICE server. This IP address is used only by external (e.g. foreign) endpoints like federated Lync clients. A Lync client will be made aware of its assigned media relay server via in-band provisioning details provided during initial registration and sign-in to Lync and it will always connect to this FQDN for media relay functionality.
(Thanks to Thomas Binder from Microsoft for offering some feedback based on his TechNet Europe presentation on Edge Media Connectivity with ICE.)