Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Fale alerts about agents down

Status
Not open for further replies.

agmontes

MIS
Oct 21, 2003
8
ES
Hi,

I´m having false alerts about agents (system and log) that stop responding that we have verified that are not true. We´ve checked that the ping is right during these moments, and connected by terminal services to the server everything seems to work properly, even with a very good speed, but with objectview we receive a lot of timeouts.

Could anybody put some light into this?

Thanks in advance.
 
I would need to know more about the alerts you are getting.
Are they Heartbeat messages?
Do you get any aws_sadmin to many request messages?
Could you cut and paste a couple of them into your thread.

Ta
 
Yes they are heartbeats of the kind that the system agent did not respond when it was expected.

Host:Windows2000_Server Windows2000_Server caiW2kOs Poll Agent:caiW2kOs N/A DOWN W2K

 
Have already you tried to create an Agent Pollset with an higher timeout for the machine?
Are you still getting the heartbeat errors?
 
Yes, I´ve set an impposible timeout of 180....but the problem remains, the retrys are set to 5.....
 
Try with starting aws_snmp and aws_dsm on the DSM machine and aws_sadmin on the agent machine in debug mode.
Do you see any errors in the DSM machine's logs? In the agent machine's?
aws_sadmin.log should log when the DSM contacts it (you should see the IP address of the DSM machine).
If not, you probably have a network or an SNMP problem.
 
I reckon this may be a problem with the agent discovery and DNS. Do the servers in question have dual NIC's?
The reason I say this is because the Heartbeat message is generated by the DSM polling the agent, rather than the agent sending the trap.
I'm presuming that you still receive valid alerts from the agent, say for example if a CPU goes critical?

Try this:
On your Agent server, edit the aws_sadmin.cfg file to include the servers' primary IP address at the bottom. Like so: # TRAP_OVERRIDE_ADDR 141.202.123.345

On your DSM and core server(s), add the agent server to the local hosts file using the same IP as you just put in the aws_Admin file.

Delete the object from the worldview and rediscover it.
Restart awservices on the agent.

See if this works.
 
Thanks so much JonnoGT. I´m gonna try this and will report about results.

Regards
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top