It's possible that either the Master switches CPU is getting pegged by something or EAPS control packets are not being prioritized properly. If I had to guess your network is getting congested at the time.
From EXOS 12.4 Concepts Guide:
Note: Increasing the failtime value increases the time it takes to detect a ring break using the polling timers, but it can also reduce the possibility of incorrectly declaring a failure when the network is congested.
The default fail-timer is 3 seconds. You can cannot make it shorter, but you can increase it if your network experiences heavy congestion.
configure eaps <name> failtime <seconds> <milliseconds>
I would try bumping it up to 5 seconds. All that does is makes EAPS take longer to declare a failure.
A few questions:
1. Are you mixing EXOS and EWare based switches in this configuration? Yes or No
2. If Yes to #1, who is the Master? EXOS switch or Eware switch?
3. What is the switch model for the Master switch?
Check mainly on the Eware switches... The control VLAN on ExtremeWare switches should be configured for QoS Profile QP8 manually to give the control VLAN priority. This is only necessary on Eware switches, but it would not hurt to also configure on Exos switches as well.
Example, if you Control VLAN for a domain is Control4001
config vlan Control4001 qosprofile QP8
Again, only necessary on ExtremeWare switches, but I usually set this on both Eware and Exos. In EXOS the CPU is smart enough to prioritize Control packets over background data. I would check on Extremeware and set QoS Profile to QP8 on your control VLANs. Congestion plus not setting this could be the CRUX of your problem.