Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations Chriss Miller on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Lost Master Replica

Status
Not open for further replies.

lgarner

IS-IT--Management
Jan 26, 2002
2,348
US
(3) NW51 SP3 servers, one master with several partitions, the other 2 have RW replicas of all partitions. At some point the master began reporting -626 errors, and the two RW servers do not have a master replica listed in the replica ring.

The two RW servers are to be shut down, but I'd like to make sure that NDS is clean first. Anyone know how to get the master server back into the RW servers' replica rings?
 
let me get this right

you have multi partitions - ie a root partition and then multiple child partitions

one of the partitions doesnt have a master anymore or multiple dont have master

please post your dsrepair log when you do a sync check

if you dont have a master then you need to promote one but you would only lose a master if you had remove ds or server failure or such like

are you still getting errors?
 
Yes, multiple partitions.
No, there is a master. It just isn't listed in the RW server's replica rings. In other words, when viewing the replica rings on the RW server, there is no master shown- only the RW. I'll post the dsrepair.log when I get to the office tomorrow.

This might not be too much of an issue. I believe that I can shut down the server tomorrow and treat it lika a crashed server to get it out of the master's ring.
 
dont remove the server - i thought you wanted a master back ?

on the server (all if possible)
1. check time sync - all communicating

2. dsrepair - advanced - replica and partiton operations
you will see the partitions ie root , finance.co , personnel.co etc - on each one hit rtn
then hit rtn on replica ring
this will tell you servers in the ring and what role they play - there will be a master for each ring , then some r/w and maybe some subs

thats the info we want
 
According to the master server, it is the master for all partitions and the RW server holds RW replicas for all partitions.

According to the RW server, it holds RW replicas for all partitions and there is no master.

I do need to keep the master running for a while longer, but there is a simple procedure to remove a "crashed" server from the replica rings and NDS. I most likely will treat the RW server as "crashed", clean up the master, and then only the single master server will remain.

I have some suggestions that I might play with once the RW server becomes unused. I am hesitant to R&R NDS on it until then since it runs BorderManager and I recall that it's a bit of work to get the BM attributes set up again in NDS.
 
what has changed on this network

the problem you may face is that if you do your suggestion - you will be unable to promote a rw to the master as the other server is no longer there - this is a simple dial in fix from novell - they will just dsdump it and tweak a figure



have you do any dstrace commands to force a heartbeat etc to kick the process going

are you getting any errors etc
 
Your replicas are inconsistent.. and what might work is to send all from the server that has the correct view of the whole tree (and is the Master?) to the other replicas that have an incorrect view. But it's pretty tough to coach a person through a problem like this without having your hands on it. but definitely get your time in synch and resolve connectivity issues. Those are what usually cause a problem like this in the first place.

Marvin Huffaker MCNE, CNE
Marvin Huffaker Consulting
 
Oddly, timesync was reporting that it was working on the debug screen, though in dsrepair I'd get -626 errors when checking pretty much anything.

The short story is that I removed NDS from the RW servers (using NWCONFIG -DSREMOVE) and removed the servers from the masters replica rings. Then I reinstalled NDS on one of the RW's for backup and it seems to be working normally. If I cared about keeping BorderManager working it would be a bit more work, but thankfully I don't need it any more.

Thanks.
Lee.
 
are you still getting ds errors in timesync and partitions?

does each partition have a master now and they all agree?

i assume you have 3 copies of each replica?
 
No, no partition errors
Yes, both servers indicate a Master and RW (and they're correct).
No (there are only two servers left in the tree).

Looks good so far.
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top