I did get this resolved... Sort of. But I beleive it is more of a band-aid than anything else.
It is my hunch that there is a bug in the client that hasn't been identified. Or if it has, a solution hasn't been found. The bug causes certain clients to drop due to some event or condition that occurs on the network. I thought it was one specific workstation type,but found out that it occurs on various workstation models and NICS.
Catching this problem in action is extremely difficult. Whenever I tried to sniff a workstation with ethereal, the problem never occurred. It's like whatever ethereal was doing, (or maybe winpcap), it prevented the problem from happening. Talk about frustrating.
What's happening is that something is causing the workstation to think that the server is down. The server name is entered into the "Bad Name Cache" on the client. This forces the client to drop the connection. After so many minutes, the cache flushes and it's possible to reconnect. That's why sometimes if you wait long enough, the connection will come back. But most people don't, and just reboot.
So the 'solution' is to turn off the Bad Server Name Cache and set the bad address timeout and bad server timeout to 0. Also ensure the auto-reconnect is set to on. This will force the client to reconnect even if prior attempts to the server have failed. At least it keeps the connection alive. These settings are only avaialble in 4.9 SP2 or higher. (There may have been reghacks available on earlier versions, but it's actually in the settings on 4.9sp2). There must have been enough people having the problem...
Marvin
Marvin Huffaker, MCNE