NetBackup Job Monitoring 1

jonesrl · Aug 22, 2005

During disaster recovering exercises we have found many times when recovering our systems we were unable to recover all the file we had expected to. Reasons may have included: Files were never backed up to begin with, files were selected for backup failed during the backup process and no indication was provided that there was a failure/problem, and a myriad of other reasons that could of caused files not being able to be recovered.

I am looking for some mechanism that we can report/verify file information before and after a backup is executed. 1). How many files on a given filesystem have been selected for backup? 2). What is the size in MB's, GB's etc... of those files selected for backup? and 3). Compare that information against the results from the backup process executed. In this way, at a glance, we can look for obvious discrepancies, .e.g. if the amount of files selected for backup is 100,000 and consisted of 500GB of storage and at the conclusion of the backup with a STATUS CODE of 0 we backed up 90,000 files consisting of 400GB of storage we would know that a potential problem exists in not getting all the files we had expected too. In the past we had made assumptions that a STATUS CODES 0 indicated successful execution of NetBackup. We have found that this is not always the case.

Anyway, all responses and assistance is requested and appreciated.

Thanks,
Rick

Vela · Aug 25, 2005

This isn't what you want, but

you can figure out how much data a system is using and compare that to the client backup status report. It will tell you how many files were backed up and how big the backup was.

AquaTeenFryMan · Aug 26, 2005

Jonesrl,

I ran into the same issue some months ago during a disaster recovery drill. Not all our files got restored that should have. Let me know if you come across anything on this issue. I'll do the same.

creakyjoe · Sep 8, 2005

Here's an example of a script I have running via cron a few times a morning. If all backups are complete, we receive a short email stating that backups are complete, otherwise we get an email briefly detailing what backups are still running. This is my personal preference as all successful backup emails from NetBackup are routed to people's waste baskets and all failure emails are allowed thru; therefore we only get the odd 'backups are complete' email as opposed to zero emails when things have worked as you still need to know things have worked. Obviously if you get zero notification then who's to say that email hasn't stopped working for instance..

script itself - nothing really clever here -

#!/bin/ksh
#
# - first check if NetBackup is running or not.
#
if [ `/usr/openv/netbackup/bin/bpps -a|grep openv|grep -v grep |grep -c openv` -eq 0 ] ; then echo "This email was generated by script that checks for overrunning Nightly NetBackups - '/home/richardb/NetBackups-overrun-or-not.sh'" | /usr/ucb/mail -s "NetBackup doesn't appear to be running - pls investigate ASAP - nthsunbackup @`date +%d/%m-%H:%M`" emailaddy@somewhere

else

# - if NetBackup is running, ONLY THEN produce report.
#
/usr/openv/netbackup/bin/admincmd/bpdbjobs -report |grep "CL"|grep -v Done|grep -v JobID|grep -v "DB Backup" > /tmp/blazing clear if [ `cat /tmp/blazing|grep -c ":"` -ne 0 ] ; then

echo "" > /tmp/over1run.txt
echo "" >> /tmp/over1run.txt
echo "\t\t\t`/usr/bin/date`" >> /tmp/over1run.txt
echo "" >> /tmp/over1run.txt
echo "\t\t\tOverrunning Overnight UNIX NetBackups" >> /tmp/over1run.txt
echo "\t\t\t=====================================" >> /tmp/over1run.txt cat /tmp/blazing|awk '{printf("%-6s %-6s %-30s %-25s %-1s %-1s %-1s\n", $1,$2,$3,$4,$6,$7,$8) ;}' >> /tmp/over1run.txt
echo "" >> /tmp/over1run.txt
echo "" >> /tmp/over1run.txt

rm -rf /tmp/blazing1

cat /tmp/blazing|awk '{print $1}' | while read line ; do /usr/openv/netbackup/bin/admincmd/bperror -jobid $line | grep L1 | awk '{print $6,$23,$25,$26,$27}' | tail -1 ; done >> /tmp/blazing1

awk '{print $1," ",$2,$3,$4,$5}' /tmp/blazing1 > /tmp/blazing1x

echo "" >> /tmp/blazing2

/usr/openv/volmgr/bin/vmoprcmd|grep -v STATUS|grep -v Comment|grep -v Drive > /tmp/blazing2

echo "** The JobID's above equate to the following tapes/drives --> **" >> /tmp/blazing2

cat /tmp/blazing2 /tmp/blazing1x >> /tmp/over1run.txt

cat /tmp/over1run.txt | /usr/ucb/mail -s "There are over-running Nightly NetBackups - nthsunbackup @`date +%d/%m-%H:%M`" emailaddy@somewhere

rm -rf /tmp/blazing /tmp/blazing1 /tmp/blazing1x /tmp/blazing2 /tmp/over0run.txt /tmp/over1run.txt

else
clear
echo "" > /tmp/over0run.txt
echo "" >> /tmp/over0run.txt
echo "\t\t`/usr/bin/date`" >> /tmp/over0run.txt
echo "" >> /tmp/over0run.txt
echo "\t\tAll Overnight UNIX NetBackups are complete" >> /tmp/over0run.txt
echo "\t\t==========================================" >> /tmp/over0run.txt
echo "" >> /tmp/over0run.txt
echo "" >> /tmp/over0run.txt
cat /tmp/over0run.txt | /usr/ucb/mail -s "All Overnight UNIX NetBackups are complete - nthsunbackup @`date +%d/%m-%H:%M`" emailaddy@somewhere

rm -rf /tmp/blazing /tmp/blazing1 /tmp/blazing1x /tmp/blazing2 /tmp/over0run.txt /tmp/over1run.txt

fi
fi

Sample output from script when backups are overrunning our backup window -

Wednesday June 29 11:40:52 BST 2005

Overrunning Overnight UNIX NetBackups
=====================================
193884 Active CL_Adhoc_u02_sunbackup Quarterly 06/29/05 11:37:37 000:03:15

PENDING REQUESTS

<NONE>

Drv Type Control User Label RecMID ExtMID Ready Wr.Enbl. ReqId
0 hcart TLD - No - -
1 hcart TLD - No - -
2 hcart TLD - No - -
3 hcart TLD root Yes 5815L1 5815L1 Yes Yes 0
4 hcart TLD - No - -
5 hcart TLD - No - -

** The JobID's above equate to the following tapes/drives --> **
193884 5815L1 drive index 3

I have written all my own various scripts around standard NetBackup commands, such as full tape reporting, offsiting, downed tape drives, even a script that monitors for ALL drives being in use. I also wrote one to monitor for restores, ie when a restore is kicked off and a tape isn't present within the robot, we get to know about it..

Some other threads that may well help -

thread for automated overnight tape drive monitoring (15minute intervals throughout the night) (creates a unique, dated logfile every day) -

http://www.tek-tips.com/viewthread.cfm?qid=949271

thread for producing daily report of all tapes used overnight -

http://www.tek-tips.com/viewthread.cfm?qid=913978

thread for auto-checking if any drives are in a state other than normal -

http://www.tek-tips.com/viewthread.cfm?qid=933262

tech2004 · Sep 8, 2005

I know Backup Exec has a MOM management pack that monitors backup and job failures and errors, device and media errors, as well as Server and Service errors. Does MOM 2005 have a management pack for Netbackup or can the MOM management pack for Backup Exec be modified to be used with Netbackup?

ferbrown · Sep 13, 2005

Hi,
Using "blat" tool over tasks on windows master server or cron on unix one I generate a report through a veritas command called "bpdbjobs -all_columns >nbreportall.txt" everyday and blat taht file report to my email.
It shows me at first time in the morning with no need to connect to the master server the status number, estimated files and size to backup and number of files and size copied besides too many more interesting options.
Its easy to export as csv into Excel and save daily every report, backup windows, sizes backed up, media used (this last one very useful to restores), and so on...

Hope it helped you,
Best Regards,

Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

NetBackup Job Monitoring 1

jonesrl

Technical User

Vela

Technical User

AquaTeenFryMan

MIS

creakyjoe

IS-IT--Management

tech2004

MIS

ferbrown

IS-IT--Management

Similar threads

Part and Inventory Search

Sponsor