Fale alerts about agents down

agmontes · Feb 23, 2004

Hi,

I´m having false alerts about agents (system and log) that stop responding that we have verified that are not true. We´ve checked that the ping is right during these moments, and connected by terminal services to the server everything seems to work properly, even with a very good speed, but with objectview we receive a lot of timeouts.

Could anybody put some light into this?

Thanks in advance.

JonnoGT · Feb 25, 2004

I would need to know more about the alerts you are getting.
Are they Heartbeat messages?
Do you get any aws_sadmin to many request messages?
Could you cut and paste a couple of them into your thread.

Ta

agmontes · Feb 25, 2004

Yes they are heartbeats of the kind that the system agent did not respond when it was expected.

Host:Windows2000_Server Windows2000_Server caiW2kOs Poll Agent:caiW2kOs N/A DOWN W2K

lza · Mar 1, 2004

Have already you tried to create an Agent Pollset with an higher timeout for the machine?
Are you still getting the heartbeat errors?

agmontes · Mar 2, 2004

Yes, I´ve set an impposible timeout of 180....but the problem remains, the retrys are set to 5.....

lza · Mar 3, 2004

Try with starting aws_snmp and aws_dsm on the DSM machine and aws_sadmin on the agent machine in debug mode.
Do you see any errors in the DSM machine's logs? In the agent machine's?
aws_sadmin.log should log when the DSM contacts it (you should see the IP address of the DSM machine).
If not, you probably have a network or an SNMP problem.

JonnoGT · Mar 4, 2004

I reckon this may be a problem with the agent discovery and DNS. Do the servers in question have dual NIC's?
The reason I say this is because the Heartbeat message is generated by the DSM polling the agent, rather than the agent sending the trap.
I'm presuming that you still receive valid alerts from the agent, say for example if a CPU goes critical?

Try this:
On your Agent server, edit the aws_sadmin.cfg file to include the servers' primary IP address at the bottom. Like so: # TRAP_OVERRIDE_ADDR 141.202.123.345

On your DSM and core server(s), add the agent server to the local hosts file using the same IP as you just put in the aws_Admin file.

Delete the object from the worldview and rediscover it.
Restart awservices on the agent.

See if this works.

agmontes · Mar 4, 2004

Thanks so much JonnoGT. I´m gonna try this and will report about results.

Regards

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

Fale alerts about agents down

agmontes

MIS

JonnoGT

Technical User

agmontes

MIS

lza

Programmer

agmontes

MIS

lza

Programmer

JonnoGT

Technical User

agmontes

MIS

Similar threads

Part and Inventory Search

Sponsor