Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

NetBackup Job Monitoring 1

Status
Not open for further replies.

jonesrl

Technical User
Aug 22, 2005
1
US
During disaster recovering exercises we have found many times when recovering our systems we were unable to recover all the file we had expected to. Reasons may have included: Files were never backed up to begin with, files were selected for backup failed during the backup process and no indication was provided that there was a failure/problem, and a myriad of other reasons that could of caused files not being able to be recovered.

I am looking for some mechanism that we can report/verify file information before and after a backup is executed. 1). How many files on a given filesystem have been selected for backup? 2). What is the size in MB's, GB's etc... of those files selected for backup? and 3). Compare that information against the results from the backup process executed. In this way, at a glance, we can look for obvious discrepancies, .e.g. if the amount of files selected for backup is 100,000 and consisted of 500GB of storage and at the conclusion of the backup with a STATUS CODE of 0 we backed up 90,000 files consisting of 400GB of storage we would know that a potential problem exists in not getting all the files we had expected too. In the past we had made assumptions that a STATUS CODES 0 indicated successful execution of NetBackup. We have found that this is not always the case.

Anyway, all responses and assistance is requested and appreciated.

Thanks,
Rick
 
This isn't what you want, but

you can figure out how much data a system is using and compare that to the client backup status report. It will tell you how many files were backed up and how big the backup was.
 
Jonesrl,

I ran into the same issue some months ago during a disaster recovery drill. Not all our files got restored that should have. Let me know if you come across anything on this issue. I'll do the same.
 
Here's an example of a script I have running via cron a few times a morning. If all backups are complete, we receive a short email stating that backups are complete, otherwise we get an email briefly detailing what backups are still running. This is my personal preference as all successful backup emails from NetBackup are routed to people's waste baskets and all failure emails are allowed thru; therefore we only get the odd 'backups are complete' email as opposed to zero emails when things have worked as you still need to know things have worked. Obviously if you get zero notification then who's to say that email hasn't stopped working for instance..


script itself - nothing really clever here -

#!/bin/ksh
#
# - first check if NetBackup is running or not.
#
if [ `/usr/openv/netbackup/bin/bpps -a|grep openv|grep -v grep |grep -c openv` -eq 0 ] ; then echo "This email was generated by script that checks for overrunning Nightly NetBackups - '/home/richardb/NetBackups-overrun-or-not.sh'" | /usr/ucb/mail -s "NetBackup doesn't appear to be running - pls investigate ASAP - nthsunbackup @`date +%d/%m-%H:%M`" emailaddy@somewhere

else

# - if NetBackup is running, ONLY THEN produce report.
#
/usr/openv/netbackup/bin/admincmd/bpdbjobs -report |grep "CL"|grep -v Done|grep -v JobID|grep -v "DB Backup" > /tmp/blazing clear if [ `cat /tmp/blazing|grep -c ":"` -ne 0 ] ; then

echo "" > /tmp/over1run.txt
echo "" >> /tmp/over1run.txt
echo "\t\t\t`/usr/bin/date`" >> /tmp/over1run.txt
echo "" >> /tmp/over1run.txt
echo "\t\t\tOverrunning Overnight UNIX NetBackups" >> /tmp/over1run.txt
echo "\t\t\t=====================================" >> /tmp/over1run.txt cat /tmp/blazing|awk '{printf("%-6s %-6s %-30s %-25s %-1s %-1s %-1s\n", $1,$2,$3,$4,$6,$7,$8) ;}' >> /tmp/over1run.txt
echo "" >> /tmp/over1run.txt
echo "" >> /tmp/over1run.txt

rm -rf /tmp/blazing1

cat /tmp/blazing|awk '{print $1}' | while read line ; do /usr/openv/netbackup/bin/admincmd/bperror -jobid $line | grep L1 | awk '{print $6,$23,$25,$26,$27}' | tail -1 ; done >> /tmp/blazing1

awk '{print $1," ",$2,$3,$4,$5}' /tmp/blazing1 > /tmp/blazing1x

echo "" >> /tmp/blazing2

/usr/openv/volmgr/bin/vmoprcmd|grep -v STATUS|grep -v Comment|grep -v Drive > /tmp/blazing2

echo "** The JobID's above equate to the following tapes/drives --> **" >> /tmp/blazing2


cat /tmp/blazing2 /tmp/blazing1x >> /tmp/over1run.txt

cat /tmp/over1run.txt | /usr/ucb/mail -s "There are over-running Nightly NetBackups - nthsunbackup @`date +%d/%m-%H:%M`" emailaddy@somewhere

rm -rf /tmp/blazing /tmp/blazing1 /tmp/blazing1x /tmp/blazing2 /tmp/over0run.txt /tmp/over1run.txt

else
clear
echo "" > /tmp/over0run.txt
echo "" >> /tmp/over0run.txt
echo "\t\t`/usr/bin/date`" >> /tmp/over0run.txt
echo "" >> /tmp/over0run.txt
echo "\t\tAll Overnight UNIX NetBackups are complete" >> /tmp/over0run.txt
echo "\t\t==========================================" >> /tmp/over0run.txt
echo "" >> /tmp/over0run.txt
echo "" >> /tmp/over0run.txt
cat /tmp/over0run.txt | /usr/ucb/mail -s "All Overnight UNIX NetBackups are complete - nthsunbackup @`date +%d/%m-%H:%M`" emailaddy@somewhere

rm -rf /tmp/blazing /tmp/blazing1 /tmp/blazing1x /tmp/blazing2 /tmp/over0run.txt /tmp/over1run.txt

fi
fi



Sample output from script when backups are overrunning our backup window -

Wednesday June 29 11:40:52 BST 2005

Overrunning Overnight UNIX NetBackups
=====================================
193884 Active CL_Adhoc_u02_sunbackup Quarterly 06/29/05 11:37:37 000:03:15

PENDING REQUESTS

<NONE>


Drv Type Control User Label RecMID ExtMID Ready Wr.Enbl. ReqId
0 hcart TLD - No - -
1 hcart TLD - No - -
2 hcart TLD - No - -
3 hcart TLD root Yes 5815L1 5815L1 Yes Yes 0
4 hcart TLD - No - -
5 hcart TLD - No - -

** The JobID's above equate to the following tapes/drives --> **
193884 5815L1 drive index 3


I have written all my own various scripts around standard NetBackup commands, such as full tape reporting, offsiting, downed tape drives, even a script that monitors for ALL drives being in use. I also wrote one to monitor for restores, ie when a restore is kicked off and a tape isn't present within the robot, we get to know about it..




Some other threads that may well help -

thread for automated overnight tape drive monitoring (15minute intervals throughout the night) (creates a unique, dated logfile every day) -
thread for producing daily report of all tapes used overnight -
thread for auto-checking if any drives are in a state other than normal -
 
I know Backup Exec has a MOM management pack that monitors backup and job failures and errors, device and media errors, as well as Server and Service errors. Does MOM 2005 have a management pack for Netbackup or can the MOM management pack for Backup Exec be modified to be used with Netbackup?
 
Hi,
Using "blat" tool over tasks on windows master server or cron on unix one I generate a report through a veritas command called "bpdbjobs -all_columns >nbreportall.txt" everyday and blat taht file report to my email.
It shows me at first time in the morning with no need to connect to the master server the status number, estimated files and size to backup and number of files and size copied besides too many more interesting options.
Its easy to export as csv into Excel and save daily every report, backup windows, sizes backed up, media used (this last one very useful to restores), and so on...

Hope it helped you,
Best Regards,
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top