Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Multiple Save Groups vs One Large Save Group

Status
Not open for further replies.

cstorm

MIS
Oct 1, 2001
69
US
I am having problems obtaining full backups after splitting a large savegroup (300 clients) into 3 smaller savegroups (100 clients each). I used to have around 4 client failures during full backups. Since the split, most of the clients fail with inactivity timeout problems. Group settings are the same. Has anyone seen this behavior with Networker and what are suggestions to remedy this? The whole idea behind splitting the groups was to prevent missing backups due to the large savegroup "already in use" error.

Thank you.
 
You simply might run out of RAM. Do not forget that each savegrp command will cause at least 1 additional process.

Also make sure that on a NW/Windows server you have the heap parameteres set to a reasonable value. See the NW 7.0 release notes for more details.
 
Thanks for the reply. The environment is Unix. This is a sample error message:

11/30/03 16:02:47 savegrp: inntums022.d51.lilly.com:E:\ failed.
* inntums022.d51.lilly.com:E:\ 3 retries attempted
* inntums022.d51.lilly.com:E:\ save: RPC error, Unable to send
* inntums022.d51.lilly.com:E:\ save: Send Chunk to MMD failed, Software caused connection abort
* inntums022.d51.lilly.com:E:\ aborted due to inactivity
11/30/03 16:02:47 savegrp: inntums022.d51.lilly.com:E:\ will retry 1 more time(s)

Does this lend itself to any specific clues?
 
From the "NetWorker Errror Message Guide":

Type
Notification

Source
RPC

Description
The remote procedure call (RPC) services are unable to
communicate with the server software. Communication
between the server and client was lost while the server
was processing the request.

Resolution
Restart the server services and retry the operation.


This sounds like a network related problem which is hard to understand because it obviously worked fine so far. Anyway, it might be true and it is hard to isolate and solve, especially with such a huge number of clients involved.

You may also try to reduce the load by limiting the savegroup or server parallelism. Keep in mind that the savegroup parallelism only exists since NW 6.1 .


 
We are running WN 6.0.1. Server parallelism is set to 48. We have 6 tape drives and each is set for 8 sessions. Should this be changed?
 
Do you run all four groups at once, or stagger them, and do they all write to the same tape pool? May be that as a single group client savesets were not started but queued up until previous ones finished. Now you may find the client save starts, has no resource available (max concurrency reached or all drives in use) so sits around doing nothing for a few hour then times out.
 
The savegroup start times are staggered, but there is only an hour in between. They all write to the same pool. I suspect that you are correct, that with one savegroup, the clients are started and queued. I thought the same would happen with multiple savegroups, but I can see that this is not the case. I believe I will need to increase the time between the savegroup start times and I need them to write to the same pool. Any other suggestions?
 
8 sessions per device is quite a high value. Do not forget that multiplexing 8 save sets of equal size is possible. However, if you only need to recover a single save set, than it would take more time as you travel along 7/8th of unnecessary data.

The number by themselves do not really mean that much. So i cannot tell you whether this is good or not.

Running overlapping savegroups like you do, let me suggest that you update to the current 6.1.3 version and limit the group parallelism. If not, jobs will just be queued which would have the same effect as if the group would not have started (except for the overhead).
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top