Connectrix: What is a keepalive timeout in FCIP
Summary: What is a keepalive timeout in FCIP?
Symptoms
In the simplest of terms a keepalive timeout is a heartbeat packet sent from peer to peer in an FCIP environment that is unsuccessful.
This is the typical messaging for KA timeout. Note the various priorities, circuits, and finally the tunnel. Each has its own KA timeout.
2016/06/17-16:00:21:308148, [XTUN-2005], 594561/36489, FID 128, ERROR, Node_32, FCIP Tunnel 16 High-Pri QoS DOWN (Keepalive Timeout)., ftnl_cp_vm.c, line: 1743, comp:bmd, ltime:2016/06/17-16:00:21:303651 2016/06/17-16:00:21:930759,
[XTUN-2005], 594564/36490, FID 128, ERROR, Node_32, FCIP Tunnel 16 Low-Pri QoS DOWN (Keepalive Timeout)., ftnl_cp_vm.c, line: 1743, comp:bmd, ltime:2016/06/17-16:00:21:916149 2016/06/17-16:00:21:931139,
[XTUN-2003], 594568/36491, FID 128, ERROR, Node_32, FCIP Tunnel 16 Circuit 3 DOWN (Keepalive Timeout)., ftnl_cp_capi.c, line: 2201, comp:bmd, ltime:2016/06/17-16:00:21:917593 2016/06/17-16:00:21:931232,
[XTUN-2003], 594569/36492, FID 128, ERROR, Node_32, FCIP Tunnel 16 Circuit 2 DOWN (Keepalive Timeout)., ftnl_cp_capi.c, line: 2201, comp:bmd, ltime:2016/06/17-16:00:21:918112 2016/06/17-16:00:21:931467,
[XTUN-2003], 594570/36493, FID 128, ERROR, Node_32, FCIP Tunnel 16 Circuit 1 DOWN (Keepalive Timeout)., ftnl_cp_capi.c, line: 2201, comp:bmd, ltime:2016/06/17-16:00:21:918586 2016/06/17-16:00:21:931595,
[XTUN-2003], 594571/36494, FID 128, ERROR, Node_32, FCIP Tunnel 16 Circuit 0 DOWN (Keepalive Timeout)., ftnl_cp_capi.c, line: 2201, comp:bmd, ltime:2016/06/17-16:00:21:919314 2016/06/17-16:00:21:939507,
[XTUN-2001], 594572/36495, FID 128, ERROR, Node_32, FCIP Tunnel 16 DOWN (Network/Remote/Other)., ftnl_cp_capi.c, line: 2111, comp:bmd, ltime:2016/06/17-16:00:21:921443 2016/06/17-16:00:21:939737,
[XTUN-2005], 594574/36496, FID 128, ERROR, Node_32, FCIP Tunnel 16 Med-Pri QoS DOWN (Internal Close)., ftnl_cp_vm.c, line: 1743, comp:bmd, ltime:2016/06/17-16:00:21:924391
By default circuits are set for a 10 sec keepalive timeout.
Use a 1 sec keepalive timeout when tunnels have multiple circuits. This way, frames can be redriven down another circuit more quickly. To modify keepalive, use portcfg fcipcircuit 16 modify <circuit ID> -k 1000
A FICON tunnel requires a keep-alive timeout of less than or equal to 1 s for each FCIP circuit added to a tunnel.
For normal operations over FCIP tunnels, the keep-alive timeouts for all FCIP circuits in an FCIP tunnel must be less than the overall I/O timeout for all FC exchanges. If the FC I/O timeout value is less than the keep-alive timeout value, then I/Os will time out over all available FCIP circuits without being retried.
The keep-alive value should be based on application requirements. Check with your FC initiator providers to determine the appropriate keep-alive timeout value for your application. The sum of keep-alive timeouts for all circuits in a tunnel should be close to the overall FC initiator I/O timeout value. As an example, a mirroring application has a 6-second I/O timeout. There are three circuits in the FCIP tunnel. Set the keep-alive timeout to 2 s on each FCIP circuit. This allows for maximum retries over all available FCIP circuits before an I/O is timed out by the initiator.
Cause
In the simplest of terms a keepalive timeout is a heartbeat packet sent from peer to peer in an FCIP environment that is unsuccessful.
Resolution
FCR requires the keepalive timer to be 1.5 s to so that FCR does not timeout.
The KA timeout value can be found under the circuit ID of the FCIP portion of the Supportsave.Circuit ID: 17.0 (Circuit 0 of tunnel 17) Circuit Num: 0 Admin Status: Enabled Oper Status: Up Connection Type: Default Remote IP: 10.251.131.58 Local IP: 10.250.30.58 Metric: 0 Failover Group ID: (Not Config/Active) Min Comm Rt: 150000 Max Comm Rt: 400000 SACK: On Min Retrans Time: 100 Max Retransmits: 8 Keepalive Timeout: 1000 <----------------- 1 second