Thursday, May 6, 2010

SIP NAT, STUN & TURN










Solution to overcome this NAT issue is STUN

  • STUN (Simple Traversal of UDP through NATs (Network Address Translation)) is a protocol for assisting devices behind a NAT firewall or router with their packet routing. RFC 5389 redefines the term STUN as 'Session Traversal Utilities for NAT'.
  • STUN enables a device to find out its public IP address and the type of NAT service its sitting behind.
  • STUN operates on TCP and UDP port 3478.
  • STUN is not widely supported by VOIP devices yet.
  • STUN may use DNS SRV records to find STUN servers attached to a domain. The service name is _stun._udp or _stun._tcp
STUN (Simple Traversal of UDP through NAT's)
allows your IP phone to discover NAT implementations (and the type of NAT it provides), in between the ip phone and the STUN server. for this reason most STUN servers are located on the Internet. Your IP phone, prior to making a SIP request, uses the STUN server to find out what NAT binding is in place, and rewrites its SIP packet accordingly. It can also be used to refresh NAT bindings (keep-alive as mentioned earlier). STUN operates great for all NAT implementations apart for symmetric NAT (explained in more detail below..).

TURN (Traversal using Relay NAT)


TURN operates in a ver similar way to STUN but also can handle RTP as well as SIP, meaning the media and signalling are proxied.

NAT


Network Address Translators modify the IP address and port information for packets traversing from one network to another. In its most typical incarnation, this is to used to manage private network ranges, and allow them to interract with publicly routed networks.


There are 4 types of NAT:


- Full
cone: The most simple (and least common in modern products) implementation. Operates in a similar way to Port address translation in that packets are accepted and forwarded according to NAT binding rules regardless of source.

- Restricted
cone: as above, with the NAT device maintaining a list of allowed originating devices

- Port restricted cone: applies port restrictions to the above

Symmetric: The most common in modern day solutions. A NAT 'binding' is created by the NAT device in response to a request made from internal_source_ip:port to public_destination_ip:port. This binding is kept alive so you can recieve responses from public_destination_ip:port (but it _has_ to come from this address/port combo). A new binding is created for each unique internal_source_ip:port combo that sends packets somewhere on a network that is external to your NAT device which controls your private LAN.

Network Address Translation (NAT) and SIP


There are several 'points of impact' where SIP finds NAT problematic, and it is not merely limited to NAT. RTP also has problems with NAT. Because SIP packets go out from a NATed client with their private (and un-routable) IP addresses coded into the message headers and SDP bodies, they are not processed by a NAT device, that operates only on the IP packets as they pass by. This then means that when the packets get to their destination, they are processed and responded to using completley useless source address information.

The 3 major problems of NAT in SIP


The VIA header problem: Responses to requests cannot he be routed back ot the originating party, as the supplied addressing information is not globally routable.

The CONTACT header problem: This refers to the fact that future requests would be routed incorrectly, again due to non-routable addresses being supplied.

The RTP problem: The final problem in the NAT/SIP category is to do with RTP (by that i mean the actual voice part of the session). The SDP messages which are used to negotiate the session format (codecs, ports, IP's etc), which are often enclosed within the SIP message body, and thus not processed by a SIP proxy according to IETF standards, will contain non-routable contact information as well (i.e. your internal address e.g. 192.168.x.x). This is a tough problem to solve.

There are several solutions to choose from to allow SIP to traverse NAT effectively. None of them is ideal, and much of it is external to SIP. The following sections refers to the
three main problems caused by SIP (listed above), and discusses the techniques applied within the protocol, and common third party solutions, to the problems.

Solutions to those 3 major problems


The Via header answer

This is solved within SIP. When a message arrives at a SIP server or UA, a comparison is made between the address the packet came from, and the one that is listed in the via header. If there is a difference, then the correct IP address (the one from which the packet originated from) is written as a parameter 'recieved=', and is added to the via header.

The contact header answer

This is a similar problem to the Via header issue, and is solved in the same way, updating the contact header instead. The contact header is referred to for communications that occur some time after an original request (such as BYE's or re-INVITE's), and this can cause additional problems, for the following reason:

NAT bindings are kept active on the NAT device for only a finite amount of time if SIP is being transported over UDP.

Solutions to this include using TCP for SIP instead of UDP (which is a connection-oriented protocol and so bindings and associated timeouts arent a problem), employing some kind of keep_alive program to maintain NAT bindings, or using STUN/TURN servers, or even a B2BUA.

The RTP answer


For instances where only one UA is behind a NAT device, symmetric NAT can be used. This effectively synchronizes the two RTP streams; the recipient of the successful RTP stream
(i.e. the globally routable UA in the session) transmits its RTP stream using the source IP of that RTP flow, ignoring the one that SDP has told it to use.

For instances where both users are behind NAT, you can employ a RTP proxy/media server/B2BUA of some kind to relay voice, breaking the call into two separate legs or you can use the TURN protocol (which essentially does the same thing).

RFC 3262 SIP PRACK

RFC 3262
- Reliability of Provisional Responses in the Session Initiation Protocol (SIP)

 SIP PRACK Overview (Provisional Response Acknowledgement)

One problem with the original SIP specification was that it provided no method for the recipient of a request to know if it's provisional responses have reached their destination when using an unreliable transport such as UDP. The ability to make these provisional responses reliable is defined by RFC 3262 "Reliability of Provisional Responses in SIP"

There are two types of responses defined by SIP.

  • They are provisional (mostly sent unreliably)
  • and final.  Final responses (2xx-6xx) convey the result of the request processing and are sent reliably.
  • SIP Provisional responses do not have an acknowledgement system so they are not reliable.
  • There are certain scenarios in which the provisional SIP responses (1xx) must be delivered reliably.
  •  For example in a SIP/PSTN inter-working scenario it is crucial that the 180 and 183 messages are not missed. To solve this problem the SIP PRACK method guarantees a reliable and ordered delivery of provisional responses in SIP.

 Diagram - SIP PRACK Handshake


 When using reliable provisional responses, these responses are retransmitted by the UAS in response to an INVITE until a PRACK is received from the UAC. If the PRACK is acceptable to the UAS, the UAS would then respond with a 200 OK to the PRACK. In this instance the PRACK serves the same role as an ACK in a normal INVITE transaction. However, unlike the ACK, PRACK has its own response. A call flow of an INVITE transaction using reliable provisional responses can be seen below.


UAC (User Agent Client) - UAS (User Agent Server) Behaviour 

The following table shows the overall behavior of UAS and UAC with various SGP configuration combinations.

UAC

UAS

Call Processing

SGP: PRACK Disabled

SGP: PRACK Disabled

Normal Call

SGP: PRACK Disabled

SGP: PRACK Supported

Normal Call

SGP: PRACK Disabled

SGP: PRACK Require

Call Rejected

SGP: PRACK Supported

SGP: PRACK Disabled

Normal Call

SGP: PRACK Supported

SGP: PRACK Supported

PRACK Call

SGP: PRACK Supported

SGP: PRACK Require

PRACK Call

SGP: PRACK Require

SGP: PRACK Disabled

Call Rejected

SGP: PRACK Require

SGP: PRACK Supported

PRACK Call

SGP: PRACK Require

SGP: PRACK Require

PRACK Call

Call Tracing:

Success

16:32:35.762 CALL(SIP) (00:0004:00) SENT 183 Session Progress Reliable (100rel) to 10.129.45.102:8000 UDP

16:32:35.782 CALL(SIP) (00:0004:00) RCVD PRACK from 10.129.45.102:8000 Cseq:2 with Via sent-by: 10.129.45.102 UDP

16:32:35.782 CALL(SIP) (00:0004:00) SENT 200 OK PRACK to 10.129.45.102:8000 UDP

 
 

Failure

21:16:47.845 CALL(SIP) (01:00004:00) SENT 421 Extension Required [PRACK support is required]
to 10.129.45.104:5060 Cseq:1

21:18:09.286 CALL(SIP) (01:00005:00) SENT 420 Bad Extension [Unsupported SIP request arrived at L3UA-TUC]
to 10.129.45.104:5060 Cseq:1

 SDP SUPPORT in PRACK 

Overview of SDP support in  SIP PRACK 

The IMG now supports embedding the SDP (Session Description Protocol) information within the PRACK message.  

PRACK sip:222@10.129.45.104:5060 SIP/2.0

Via: SIP/2.0/UDP 10.129.47.146:8000;branch=z9hG4bK3b1bce

0-22330

Max-Forwards: 70

From: sip:111@10.129.47.146:8000;tag=48346074

To: sip:222@10.129.45.104;tag=a94c095b773be1dd6e8d668a785a9c84904eaa2e

Call-ID: 11408@10.129.47.146

CSeq: 2 PRACK

RAck: 1 1 INVITE

Content-Type: application/sdp

Content-Length: 174

                       
 

v=0

o=_ 2890844527 2890844527 IN IP4 10.129.47.146

s=-

c=IN IP4 10.129.47.146

t=0 0

m=audio 9000 RTP/AVP 0 101

a=rtpmap:0 PCMU/8000

a=rtpmap:101 telephone-event/8000

  Above is a sample PRACK message that contains the SDP information in it. The IMG supports incoming PRACK with SDP support and all currently implemented rules of SDP decoding/overlap failure still apply. This means the following: 

  • An invalid SDP message will be ignored by the IMG
  • The IMG will accept a valid PRACK message which has invalid SDP information embedded. It will discard only the SDP portion of the PRACK message. (Media parameters will not change.)
  • The IMG does not support sending out a PRACK message with the SDP information embedded. It will only send out a PRACK message without the SDP information embedded. 

Troubleshooting

If you are experiencing problems with this feature, check the following:

  • Make sure PRACK is enabled in the SIP SGP
  • Make sure the correct SIP SGP is assigned to the External Gateway
  • The External Gateway must support PRACK.

============================================================================

Any incoming INVITE messages with a Require header field indicating a SIP extension that is not in this list of supported SIP extensions are automatically rejected with a 420 (Bad Extension) response. If an application supports 100rel, incoming INVITE messages with Supported or Require header fields for the 100rel SIP extension are exposed to the application. If the application does not support 100rel, incoming INVITE messages with Require header fields for 100rel are automatically rejected with a 420 response. Note that incoming INVITE messages with Supported header fields for 100rel are still exposed to the application as there is no requirement to support this extension even though the inviter supports it. When the application sends any INVITE message, the platform will automatically add Supported or Require header fields for 100rel depending on the reliability policy set for the session.

*******************************************************************************************************************************

CTMF RELAY

DTMF-Relay

Dual-Tone Multifrequency (DTMF) is the tone generated on a touch-tone phone when you press keypad digits. During a call you might enter DTMF to access Interactive Voice Response (IVR) systems such as voicemail, automated banking services and so on.

In previous releases of Cisco IOS, DTMF is transported in the same way as voice.

This approach can result in problems accessing IVR systems. While DTMF is usually transported accurately when using high-bitrate voice CODECs such as G.711, low-bitrate CODECs such as G.729 and G.723.1 are highly optimized for voice patterns, and tend to distort DTMF tones. As a result, IVR systems may not correctly recognize the tones.

DTMF relay solves the problem of DTMF distortion by transporting DTMF tones "out of band", or separate from the encoded voice stream.

Cisco H.323 Version 2 support introduces three options for sending DTMF tones out of band. These are:

•A Cisco proprietary RTP-based method ("dtmf-relay cisco-rtp").

•H.245 signal ("dtmf-relay h245-signal").

•H.245 alphanumeric ("dtmf-relay h245-alphanumeric").

If none of these options is selected, DTMF tones are transported inband, and encoded in the same way as voice traffic.

The "cisco-rtp" option sends DTMF tones in the same RTP channel as voice. However, the DTMF tones are encoded differently from the voice samples and are identified by a different RTP payload type code. Use of this method accurately transports DTMF tones, but since it is proprietary it requires the use of Cisco gateways at both the originating and terminating endpoints of the H.323 call.

The "h245-signal" option relays a more accurate representation of a DTMF digit than the "h245-alphanumeric" option, in that tone duration information is included along with the digit value.

This information is important for applications that require you to press a key for a particular length of time. For example, one popular calling card feature allows you to break out of an existing call by pressing the (#) key for more than two seconds and then make a second call without having to hang up in between. This feature is beneficial because it allows you to avoid having to dial your access number and PIN code again, and it allows you to avoid access charges if you are charged for accessing an outside line as is common at hotels.

The "h245-alphanumeric" option simply relays DTMF tones as ASCII characters. For instance, the DTMF digit 1 is transported as the ASCII character "1". There is no duration information associated with tones in this mode.

When the Cisco H.323 gateway receives a DTMF tone using this method, it will generate the tone on the PSTN interface of the call using a fixed duration of 500 ms. All H.323 version 2 compliant systems are required to support the "h245-alphanumeric" method, while support of the "h245-signal" method is optional.

The ability of a gateway to receive DTMF digits in a particular format and the ability to send digits in that format are independent functions. To receive DTMF digits from another H.323 endpoint using any of the methods described above, no configuration is necessary. The Cisco H.323 version 2 gateway is capable of receiving DTMF tones transported by any of these methods at all times.

However, to send digits out of band using one of these methods, two conditions must be met:

1. You must enable the chosen method of DTMF relay under "dial-peer" configuration using the "dtmf-relay" command.

2. The peer (the other endpoint of the call) must indicate during call establishment that it is capable of receiving DTMF in that format.

You may enable more than one DTMF relay option for a particular dial peer. If you enable more than one option, and if the peer indicates that it is capable of receiving DTMF in more than one of these formats then the gateway will send DTMF using the method among the supported formats that it considers to be the most preferred. The preferences are defined as follows:

1. cisco-rtp (highest preference)

2. h245-signal

3. h245-alphanumeric

If the peer is not capable of receiving DTMF in any of the modes that you have enabled, DTMF tones will be sent inband.

When the Cisco H.323 version 2 gateway is involved in a call to a Cisco gateway running a version of IOS prior to 12.0(5)T, DTMF tones will be sent inband since those systems do not support DTMF relay.

Hookflash Relay (hookflash as a means of switching between calls if you subscribe to a call waiting service.)

A "hookflash" indication is a brief on-hook condition that occurs during a call. It is not long enough in duration to be interpreted as a signal to disconnect the call. You can create a hookflash indication by quickly depressing and then releasing the hook on your telephone.

PBXs and telephone switches are frequently programmed to intercept hookflash indications and use them as a way to allow a user to invoke supplemental services. For example, your local service provider may allow you to enter a hookflash as a means of switching between calls if you subscribe to a call waiting service.

In the traditional telephone network a hookflash results in a voltage change on the telephone line. Since there is no equivalent of this voltage change in an IP network, the ITU H.245 standard defines a message representing a hookflash. To send a hookflash indication using this message, an H.323 endpoint sends an H.245 User Input Indication message containing a "signal" structure with a value of "!". This value represents a hookflash indication.

Cisco H.323 Version 2 support includes limited support for relaying hookflash indications via H.245. H.245 User Input Indication messages containing hookflash indications that are received on the IP call leg are forwarded to the POTS call leg if the POTS interface is FXO. If the interface is not FXO, any H.245 hookflash indication that is received is ignored. This support allows IP telephony applications to send hookflash indications to a PBX through the Cisco gateway, and thereby invoke the PBX's supplementary services if the PBX supports access to those features via hookflash.

The gateway does not originate H.245 hookflash indications in this release. For example it does not forward hookflash indications from FXS interfaces to the IP network over H.245.

The acceptable duration of a hookflash indication varies by equipment vendor and by country. While one PBX may consider a 250 ms on-hook condition to be a hookflash, another PBX may consider this condition to be a disconnect. Therefore this release of IOS adds the "timing hookflash-out" command to allow the administrator to define the duration of a hookflash signal generated on an FXO interface.

CODEC Negotiation

CODEC negotiation allows the Gateway to offer several CODECs during the H.245 capability exchange phase and ultimately settle upon a single common CODEC during the call-establishment phase. This increases the probability of establishing a connection since there will be a greater chance of over-lapping audio capabilities between endpoints. Normally, only one CODEC can be specified when configuring a Dial-Peer, but CODEC negotiation allows you to specify a prioritized list of CODECs associated with a Dial-Peer. During the call-establishment phase the router will use the highest priority CODEC from the list which it has in common with the remote endpoint. It will also adjust to the CODEC selected by the remote endpoint so that a common CODEC is established for both the receive and transmit audio directions.

When a call is originated, all of the CODECs associated with the Dial-Peer are sent to the terminating endpoint in the H.245 Terminal Capability Set message. At the terminating endpoint, the gateway will advertise all of the CODECs that are available in firmware in its Terminal Capability Set. If there is a need to limit the CODECs advertised to a subset of the available CODECs, a terminating Dial-Peer must be matched which includes this subset. The "incoming called-number" command under the Dial-Peer can be used to force this match.

Stateless & Statefull Servers

There are two basic types of SIP proxy servers--stateless and statefull..

Stateless Servers

Stateful Servers

  • Stateless servers are simple message forwarders.
  • They forward messages independently of each other. Although messages are usually arranged into transactions, stateless proxies do not take care of transactions.
  • Stateless proxies are simple, but faster than stateful proxy servers.
  • They can be used as simple load balancers, message translators and routers.
  • One of drawbacks of stateless proxies is that they are unable to absorb retransmissions of messages and perform more advanced routing, for instance, forking or recursive forwarding.
  • Statefull proxies are more complex.
  • Upon reception of a request, statefull proxies create a state and keep the state
    until the transaction finishes.
  • Some transactions, especially those created by INVITE, can last quite long (until callee picks up or declines the call). Because stateful proxies must maintain the state for the duration of the transactions, their performance is limited.
  • The ability to associate SIP messages into transactions gives stateful proxies some interesting features
  • Statefull proxies can perform forking, that means upon reception of a message two or more messages will be sent out.
  • Statefull proxies can absorb retransmissions because they know, from the transaction state, if they have already received the same message (stateless proxies cannot do the check because they keep no state)
  • Statefull proxies can perform more complicated methods of finding a user. It is, for instance, possible to try to reach user's office phone and when he doesn't pick up then the call is redirected to his cell phone. Stateless proxies can't do this because they have no way of knowing how the
    transaction targeted to the office phone finished.

Most SIP proxies today are stateful because their configuration is usually very complex. They often perform accounting, forking, some sort of NAT traversal aid and all those features require a stateful proxy.