I have a Dell PowerEdge 2600 that I migrated our existing AD to back in May. Everything went OK with the migration. The server is running SP3, in addition to being our AD DC server, it is also our file and print server.
3 times since May, this server has become unresponsive. Users were not able to do anything on the network, and I was unable to contact the server over the network, except for pings. Each time, I found Srv Event ID 2020 in the system event log. I found some info on this error on a few places on the net, including a thread here a few months ago, and article 312362 at Microsoft. This problem occurred at peak hours each time.
The most recent time this happened, I got the Srv Event ID 2020, followed by hundreds of Srv Event ID 2000 errors - they almost filled my log. About 40 of these errors were being generated each second. I am trying to determine if these errors are being caused by the initial problem, or if they are the reason for the problem in the first case.
I suspect a memory leak as indicated in 312363 - but I'm not totally convinced yet. I plan to use the poolsnap utility to watch the paged pool, and I am gathering data on the size of the paged pool if it gets close to 343mb. Before the recent crash - I noticed the paged pool was close to this limit. I also plan to update to SP4.
Also, about the time this happened, the server had approx. 400 files open. Most of these files were from users attached to a Goldmine database that is located on this server.
Has anyone else had this problem? I want to be sure that the actions I take to resolve the problem fix it. This is an extremely frustrating situation due to how fickle this problem is. I'm still not 100% sure it is a memory leak.
Thanks,
BleachLPB
3 times since May, this server has become unresponsive. Users were not able to do anything on the network, and I was unable to contact the server over the network, except for pings. Each time, I found Srv Event ID 2020 in the system event log. I found some info on this error on a few places on the net, including a thread here a few months ago, and article 312362 at Microsoft. This problem occurred at peak hours each time.
The most recent time this happened, I got the Srv Event ID 2020, followed by hundreds of Srv Event ID 2000 errors - they almost filled my log. About 40 of these errors were being generated each second. I am trying to determine if these errors are being caused by the initial problem, or if they are the reason for the problem in the first case.
I suspect a memory leak as indicated in 312363 - but I'm not totally convinced yet. I plan to use the poolsnap utility to watch the paged pool, and I am gathering data on the size of the paged pool if it gets close to 343mb. Before the recent crash - I noticed the paged pool was close to this limit. I also plan to update to SP4.
Also, about the time this happened, the server had approx. 400 files open. Most of these files were from users attached to a Goldmine database that is located on this server.
Has anyone else had this problem? I want to be sure that the actions I take to resolve the problem fix it. This is an extremely frustrating situation due to how fickle this problem is. I'm still not 100% sure it is a memory leak.
Thanks,
BleachLPB