Here's the answer to your questions, sorry but the nice diagrams didn't post.
IPSI Sockets and Heartbeats
The CM Server communicates with a Port Network via a TCP socket connection to the IPSI as shown in Figure 1. This connection is critical to all communications that go through the port network (G650). All control signals for all endpoints and adjuncts that connect through the Port Network are multiplexed and sent via this TCP socket connection.
Figure 1: Server – Port Network Connection
The server exchanges heartbeats with the IPSIs every second. IPSI sanity failure occurs when a heartbeat is missed and if no other data has been received from the IPSI during the last second. During a Control Network outage, the server and the IPSIs buffer all downstream and upstream messages in queues. If the socket communication is restored before the IPSI Socket Sanity Timeout is reached, the socket communication resumes and all queued messages are sent. This recovery is represented by
Region A in Figure 2.
If the IPSI sanity failures last longer than the IPSI Socket Sanity Timeout setting but shorter than 60 seconds, then recovery actions are initiated, including closing and reopening the socket connection (all downstream and upstream messages buffered in queues are lost), resetting the PKTINT (Packet Interface on the IPSI cards) and performing a warm restart of the affected port network. This recovery is represented by Region B in Figure 2.
If the IPSI sanity failures last longer than 60 seconds, the affected port network goes through a cold restart. This recovery is represented by Region C in Figure 2.
If an alternate control path to the affected port network is available and viable, interchange to the alternate control path is made after 3 seconds of IPSI sanity failures. Alternate control path is either:
• Secondary IPSI in the same port network, or
• Fiber connection via ATM switch or Centre Stage Switch If both primary and secondary IPSI connections have concurrent network outages (most likely due to non-diverse-path routing), the secondary IPSI connection is not viable and thus not available for interchange.
Recovery Behaviour in Region A
If the Control Network outage is shorter than the IPSI Socket Sanity Timeout (Region A in Figure 2), the upstream and downstream data that were blocked and buffered due to the network outage will resume flowing after the TCP recovery. All connections that go through the port network are preserved. All messages are buffered and sent with a delay due to the network outage and recovery. See Table 1 for more details on recovery behaviour in Region A. Refer to Figure 3 when reading Table 1. Note that there are two port networks in this example. The network outage happens in the WAN. The port network at the remote site is affected by the network outage, but the port network at the main site is not affected by the network outage.
Recovery Behaviour in Region B
If the Control Network outage is longer than the IPSI Socket Sanity Timeout period, but is shorter than 60 seconds (Region B in Figure 2), then, the port network goes through a warm restart and the Packet Interface (PKTINT) is reset. This results in lost upstream and downstream messages and results in LAPD links being reset and C-LAN socket connections being closed and reopened. Most stable calls will stay connected. Calls in transition may be lost. See Table 2 for more details on recovery behaviour in Region B.
Recovery Behaviour in Region C
If the Control Network outage is longer than 60 seconds, the recovery of the affected port network requires a cold restart of the port network1. This drops all calls going through the affected port network. Only the shuffled IP-to-IP calls that are not using any port network resources will stay connected until the user drops the call. See Table 3 for more details on recovery behaviour in Region C.
[Started on Version 3 software 15 years a go]