It is a common response that CRCs or Semaphore Timeouts are hardware related but this is far from it. Don't get me wrong I've seen a drive correct the problem but only about 1 out of 10 times. You can replace a drive, media, or the entire library and still see these errors. The biggest response you'll get is it has been working all this time so it has to be a hardware failure. I've seen the most wacked out configuration work for a year and then finaly blow up. You fix what should have been configured properly a year ago and all is well. Any type of small change can finaly expose the true problem. Is this the case here? Could be. I just wanted to bring this up because so many people insist on hardware related errors that they end up takeing months to get to the real issue. Media age is a good point. Heck even the brand of media can introduce errors. I've seen people search on the web for the cheapest media they can find to put in the $100,000 tape library. They are backing up critical data to the cheapist media they can find. Yes, quality of media or lack of it is seen across Media vendors. Good example is a company's media that starts with an "I". Although there is a "standard" for the way media is produced the way the case itself is not part of that standard. This case is known to be troublesome for pickers in tape libraries. It is also not able to contact with sensors in some libraries thus giving the illusion of the slot being empty to the library.
Ok back to the issue.
If you have not done so already break the jobs into three jobs one for each server. You can set the remote agents into a debug mode. The steps can be found on Veritas' site. Run SGMON.EXE on the Master durring the backups. A few things to check and note. Are you teaming NICS? This has been known to cause issues with Veritas. If so un-team the nics for at least a short testing period. Is the NIC(s) set to Auto Negotiate or are they hard set. If they are set to Auto Negotiate hard set them to the highest through put. Make sure your switches are hard set as well. Make sure to check this stuff on the Master and the three servers causing the problem. Make sure your drivers are up to date for your NICS, SCSI/HBA card(s) and your tape devices.
The data is written to the tape in chunks. This is done in a burst. The tape will contain many chunks of the data being backed up. This is where the chunk size option comes in. You know that one drop down box that lets you choose chunk size. It usualy defaults to 64 and that is excepted industry wide. Now for any reason a chunk is skipped or is corrupted durring the write you can see a multitude of issues. The header information on the tape will be inconsistant with the data that is acctualy written to the tape. One way to avoid this in the future is to do a quick erase on the media you are rotating in for that backup cycle. I would do this no matter what. Keep an eye on how long you have had media. People forget how valuble there data is untill it is called for a restore. Check your System event viewer and Application viewer as well around the time of failure. You can use the event ids to search Veritas'website as well. Good chance you will see event id 7, 9, 11 in the system event viewer. In the Application event viewer you will get a five digit number I think. This can be used to search thier site for answers. I like to start a spreadsheet, put down the event id and then the description. Eventhough at first you see at lot of different ids that seem to not be related it will all come together to point you in a good direction for resolution. CRC is a way the data can be checked in a fairly quick manner to what was backed up. Now heres where network or SCSI timeouts can come about (SCSI timeouts are not allways a hardware failure) to cause CRCs. If at anytime durring a Cyclic Redundancy Check "communication" is lost between the data and the data on the media you will get a CRC. This is due to the fact that the CRC failed to confirm the written data on the tape was correct and consistant to the data being backed up.
Oh man sorry to ramble. It's late and I ramble when I'm tired. Just to shorten it up...There are a few things that can be done to test functionality of a backup before getting hardware replaced. The steps can be performed a lot faster than waiting for hardware, replacement, backups to run and get the failure email we all do not like to see.
Can't spell worth beans so forgive me in advance. Grammer is not a specialty as well.
Again sorry to ramble.