Tek-Tips is the largest IT community on the Internet today!

Members share and learn making Tek-Tips Forums the best source of peer-reviewed technical information on the Internet!

  • Congratulations TouchToneTommy on being selected by the Tek-Tips community for having the most helpful posts in the forums last week. Way to Go!

Re-mirroring in SFT 3

Status
Not open for further replies.

jaybo71

MIS
Sep 24, 2002
25
US
we have a SFT3 setup that was unfortunaley not very well documented by the previous admin. We had an event recently where the SCSI backplane failed on the primary engine and the secondary engine did not pick up properly. Apparently the disk mirroring was partially at blame since booting the the secondary system by itself would not work. I was able to get the primary back online but I need to re-establish the mirror and want to make sure that things are good this time. I now have 2 incomplete mirrors and out of sync partitions.

Anyone have any advice here, or could point be to some good documentation that could walk me through some steps?

Thanks all...

 
SFT3 is a whole different ballgame, but you might look at these tids:



Also, here is some documentation I dug up. I hope you can find what you need somewhere within:

SFT III Management Tips


This appendix describes tips for managing and troubleshooting a NetWare® 4TM SFT IIITM network. The information is divided into these categories:
Server Synchronization
Mirrored Server Link (MSL)
Server Configuration (.NCF) Files
Server Memory
Server Consoles
Server Hard Disks
Network Clients
Network Performance
Troubleshooting

The SET command is mentioned frequently in these sections. For more information on the SET command and its parameters, see SET in Utilities Reference.

01/22/2002
Server Synchronization

This section discusses possible solutions to software failures related to SFT III server synchronization.

01/22/2002

Detecting Server Synchronization Errors

For a comprehensive check of MSEngine outputs on each server, turn on the Comprehensive MSEngine Synchronization Check SET parameter. By default, a less intrusive check is performed. Using the comprehensive check may affect server performance.

Because this parameter must be set before the servers are activated (only settable at startup), put this command in the IOSTART.NCF file for each server:

SET Comprehensive MSEngine Synchronization Check=ON

01/22/2002

Handling Secondary Restarts Immediately After Synchronization

If the secondary server restarts immediately after synchronization, increase the IPX Internet Down Wait Time and the MSL Deadlock Detect Wait Time SET parameters, using the following syntax:

SET IPX Internet Down Wait Time=(Variable:)variable 

SET MSL Deadlock Detect Wait Time=(Variable:)variable 

If this does not solve the problem, halt the server by changing all mirrored servers' error recovery options to 0. See Servers Restart for No Apparent Reason for specific options. After halting the server, contact your support representative.

01/22/2002

Notifying Users of Server Synchronization

To send a broadcast message to all logged-in network users when the mirrored servers begin synchronizing, type the following.

SET Notify All Users Of Mirrored Server Synchronization=ON

The default setting for this parameter is OFF.

01/22/2002

Reducing the Time for Resynchronizing and Remirroring

To shorten server synchronization time, reduce the amount of memory that must be synchronized, using the following syntax:

SET New End Address For Unclaimed Memory Block=(Variable:)variable 

Important: Using this SET parameter to reduce the amount of unclaimed memory may prevent some loadable modules from loading in the MSEngine. If the MSEngine memory is too small, some volumes may not mount.


If you have disk arrays, you may be able to speed up disk mirroring with the Concurrent Remirror Requests SET parameter.

01/22/2002

Responding to Invalid Mirrored Server Initialization Messages

If this message appears when the servers are in test mode, no action is necessary as long as the servers continue to synchronize.

To prevent server initialization problems, increase the Restart Minimum Delay Amount, using the following syntax:

SET Restart Minimum Delay Amount=(Variable:)variable 

Also, verify that the MSL boards and cables are functional. Then unload and reload the MSL driver on each server.

If you are using NE2000TM boards as MSL boards, make sure that the interrupt settings on the MSL boards have a higher priority than the network boards, and that the MSL boards are cabled correctly.

Note: Interrupt priorities from high to low are 0, 1, 2 or 9, A, B, C, D, E, F, 3, 4, 5, 6, 7, and 8. Interrupt 2 is the highest priority you can assign to an MSL board because interrupts 0 and 1 are reserved.


01/22/2002

Responding to Mirrored Server Engine Already Loaded Messages

This message means you have already executed the ACTIVATE SERVER console command, either at the IOEngine console or in the IOSTART.NCF file of the other server. Delete ACTIVATE SERVER from the IOSTART.NCF file.

Important: Putting ACTIVATE SERVER in the IOSTART.NCF file may cause synchronization problems. If both servers execute ACTIVATE SERVER simultaneously, the utility may load two separate, unsynchronized MSEngines, and both servers would assume the primary server role.


01/22/2002
Mirrored Server Link (MSL)

This section discusses possible solutions to hardware and software failures related to the high-speed cable links between the mirrored SFT III servers.

01/22/2002

Determining Appropriate MSL Cable Lengths

The maximum cable length between MSL boards is determined by the cable manufacturer. Ranges are from 30 to 100 meters for coaxial cable, and from 1 to 40 kilometers for fiber optic cable.

01/22/2002

Handling MSL Cable or MSL Board Failure

When an MSL cable or board fails, the secondary server restarts but cannot synchronize with the primary server until the MSL problem is corrected.

To prevent future MSL problems and loss of server mirroring, install redundant, alternate MSL boards and cabling in each server. The alternate MSL driver must be loaded before it can take over for an active MSL that fails.

Put the LOAD command for the alternate MSL driver in the IOSTART.NCF files of both servers, after the LOAD command for the first MSL driver. Use the following syntax:

LOAD (Variable:)alternate_driver_name 

The order the drivers are loaded determines which MSL is the default and which is the first alternate, second alternate, and so forth.

01/22/2002

Responding to MSL Communications Error Messages

Check the MSL cable and board connections to make sure they are properly connected and that the boards are firmly seated. Also check for kinks or damage in the cabling.

01/22/2002

Responding to MSL Deadlock Delivering Data Messages

This message means that neither MSL can transmit data because of a holdoff. A holdoff occurs when the IOEngine receives a packet but cannot process it, or when one MSL sends a packet but doesn't receive an acknowledgment.

Increase the MSL Deadlock Detect Wait Time, using the following syntax:

SET MSL Deadlock Detect Wait Time=(Variable:)variable 

If the servers are very busy, increase this parameter in 1-second increments until the error disappears.

Note: Make sure the value of the MSL Deadlock Detect Wait Time parameter is at least1second longer than the IPX Internet Down Wait Time parameter.


01/22/2002
Server Configuration (.NCF) Files

SFT III reads from these server configuration files, in the following order, when you power on or restart the servers:

l IOSTART.NCF (two files__one for each server)

l MSSTART.NCF

l MSAUTO.NCF

l IOAUTO.NCF (two files__one for each server)

Most of these files are created during the installation process. Use INSTALL to create or edit server configuration files. You can also use EDIT to edit .NCF files.

Example:
IOSTART.NCF file (on boot partition)
ioengine name SFT3_IO1 ioengine ipx internal net 7654321 load isadisk port=1f0 int=e load nmsl
Example:
MSSTART.NCF file (on boot partition)
set Concurrent Remirror Requests=11
Example:
MSAUTO.NCF file (on volume SYS:)
set Time Zone=MST7MDT set Daylight Savings Time Offset=1:00:00 set Start Of Daylight Savings Time=(APRIL SUNDAY FIRST 2:00:00 AM) set End Of Daylight Savings Time=(OCTOBER SUNDAY LAST 2:00:00 AM) set Default Time Server Type=SINGLE set Bindery Context=O=Novell msengine name SFT3 msengine ipx internal net 1234567 mount all
Example:
IOAUTO.NCF file (on boot partition or volume SYS:)
sys:etc\io1\initsys.ncf load tcpip
Note: To edit INITSYS.NCF, see INETCFG in Utilities Reference.


01/22/2002
Server Memory

This section discusses how to install additional server memory without disrupting the network and how to respond to the message Secondary server is missing RAM.

01/22/2002

Adding Memory Without Bringing the MSEngine Down

A NetWare 4 SFT III system allows you to upgrade server hardware without loss of service to clients.

Use the procedure below to add memory to your servers without bringing down the MSEngine.

01/22/2002

Procedure

1. Halt the secondary server and turn it off.

The primary server is still running.

2. Add memory to the secondary server and turn it on.

3. Reconfigure the hardware, using the hardware-specific configuration procedure specified by the manufacturer.

4. From the boot prompt on the secondary server, type
MSERVER <Enter>
5. Wait for resynchronization to complete.

During resynchronization, the primary server does not recognize the additional memory in the secondary server.

6. After the disks are remirrored, halt the primary server and turn it off.

The secondary server becomes the new primary server.

7. Add memory to the new secondary server and turn it on.

8. Reconfigure the hardware, using the hardware-specific configuration procedure specified by the manufacturer.

Note: To ensure synchronization, install the same amount of memory in both SFT III servers.


9. From the boot prompt on the secondary server, type
MSERVER <Enter>
10. Wait for resynchronization to complete.

During synchronization, the primary server now recognizes the additional memory in the secondary server.

01/22/2002

Responding to Secondary Server Is Missing RAM Messages

This message means that the primary server cannot synchronize memory with the secondary server for one of the following reasons:

l The primary server has more RAM than the secondary server.

l The primary and secondary have the same amount of RAM installed, but the memory is noncontiguous and the memory holes don't match.

In this situation, do one of the following:

l Add memory to the secondary server so that its RAM is equal to the RAM in the primary server.

l Run the EISA configuration utility (if applicable) so that both servers are in the same mode (linear or compatible mode).

l Align the server memory holes using the procedure below.

01/22/2002

Procedure

1. Change to the IOEngine prompt on the secondary server.

2. Check the memory addresses by typing
MEMORY MAP <Enter>
A display similar to the following appears.

System memory map:

0-12288(DOS)

12288-159744(DOS)

159744-654336(IOEngine)

1048576-5767168(IOEngine)

5767168-17170432(Unclaimed)

3. From the Unclaimed range on the last line of the display, write down the start and end memory addresses.

Note: Unclaimed memory is used by the MSEngine, so it must be identical on both servers.


4. From the IOEngine console on the primary server, type
HALT <Enter>
5. Add the following commands to the primary server's IOSTART.NCF file:
SET New Start Address for Unclaimed Memory Block=(Variable:)x  SET New End Address for Unclaimed Memory Block=(Variable:)
The values following the = (equals) sign are the start and end of the unclaimed memory range on the secondary server (the numbers you wrote down in ).

6. Restart the primary server.

Now that the unclaimed memory is aligned on both servers, they can synchronize. SFT III allocates any extra memory to the IOEngine on that server.

01/22/2002
Server Consoles

This section discusses solutions for keeping track of the three NetWare 4 SFT III server consoles__the two IOEngines and the MSEngine__and for managing errors displayed at these consoles.

01/22/2002

Logging and Viewing All Console Messages

Some SFT III console messages are never seen because you can't view all three consoles (two IOEngines and one MSEngine) at once.

To keep track of these messages, use CONLOG to capture all console messages and to write them to SYS:\ETC\CONSOLE.LOG.

To log messages from all three consoles, add the line LOAD CONLOG to the IOSTART.NCF files of both servers and to the MSAUTO.NCF file.

Important: The LOAD CONLOG command must be the first line in each .NCF file from which you want to log messages.


To view the CONSOLE.LOG file, use EDIT or INETCFG.

01/22/2002

Handling Lost Interrupt Alerts on the Console

If lost interrupt alerts are filling up the console screen, at the appropriate IOEngine console, type

SET Display Lost Interrupt Alerts=OFF <Enter>

A lost interrupt message indicates a problem driver or faulty hardware. To find the driver with the interrupt problem, turn on the Display Lost Interrupt Alerts parameter as shown above. Then unload all the drivers from the appropriate IOEngine, and reload them one at a time. Contact the vendor of the problem driver.

01/22/2002

Responding to Loader Cannot Find... Messages

This message means you tried to load an NLM in the wrong engine (IOEngine or MSEngine), or that the NLM you tried to load depends on other NLMs that aren't loaded.

Use the <Alt>+<Esc> keys to toggle to the other engine's console, and attempt to load the NLM again.

If the message still appears, you may have loaded an outdated or unsupported NLM.

01/22/2002

Shortening the Console Prompt

Long SET parameter names may scroll behind the console prompt if you have a long IOEngine or MSEngine name in the prompt. However, scrolling does not affect the execution of the SET utility.

If you want a two-character console prompt (IO: or MS:), type the following at the appropriate console:

SET Replace Console Prompt With Server Name=OFF <Enter>

01/22/2002
Server Hard Disks

This section discusses solutions for hard disk errors and storage shortages on a NetWare 4 SFT III server.

01/22/2002

Controlling the Size of Log Files on Volume SYS:

To limit the size of SFT III log files, IO$LOG.ERR and MSSTATUS.DMP, use the following commands:

SET IOEngine Error Log File Overflow Size= (Variable:)number 

SET Status Dump File Overflow Size= (Variable:)number 

Replace number with the size limit in bytes.

The system will delete or rename the log files when they meet or exceed the size limit, depending on the SET IOEngine Error Log File State and SET Status Dump File State parameter values.

01/22/2002

Handling Primary Hard Disk Failure

The primary server remains active if a primary hard disk fails.

In this situation, clients continue to communicate with the primary server, but disk requests use only the secondary server's disk (instead of splitting the seeks between the servers).

Use the procedure below to restore fault tolerance to your server storage.

Warning: Make sure the failed disk is on the primary server and that the secondary server's disk is functional before attempting this procedure, or you may lose data.


01/22/2002

Procedure

1. Change to the primary server's IOEngine console.

2. Force a server switchover by typing
HALT <Enter>
The secondary server takes over the primary server role.

3. Correct the hard disk problems and resynchronize the servers.

4. Use the Server Failure Notification Name SET parameter to notify you immediately in case of another server disk failure. Add this parameter to the MSAUTO.NCF file:
SET Server Failure Notification Name=(Variable:)group_name  | (Variable:)user_name 
01/22/2002

Mounting a CD-ROM as a NetWare 4 SFT III Volume

Warning: Access to data on a CD-ROM volume will be lost if the SFT III servers switch from primary to secondary unless you mount the same CD on both servers.


To mount a CD-ROM disc as a NetWare 4 SFT III volume, follow these steps.

01/22/2002

Prerequisites

A mounted volume SYS:

An installed Host Bus Adapter (HBA) that is NetWare compatible and supports CD-ROM devices

The NPAIO.DSK and NPAMS.NLM NetWare Peripheral Architecture (NPA) modules

The disk driver files and necessary support modules for the CD-ROM device

Note: Some disk drivers consist of more than one file and some HBA devices require additional support modules for CD-ROM functionality. These files should accompany the HBA. For specific file requirements, consult your adapter documentation.


01/22/2002

Procedure

1. Change to the IOEngine prompt of the SFT III server where the HBA is installed.

2. Load the disk driver by typing
LOAD [(Variable:)path ](Variable:)disk_driver  <Enter>
Replace (Variable:)disk_driver  with the name of the disk driver specified in the HBA documentation.

For example, to load the disk driver for the Adaptec AHA-1522 SCSI HBA, type
LOAD [(Variable:)path ]AHA1520.DSK <Enter> LOAD [(Variable:)path ]ASPICD.DSK <Enter>
3. Load CDROM.NLM by typing
LOAD CDROM <Enter>
This auto-loads the NPAMS module

Note: When a CD-ROM is mounted or a CD-ROM disc is changed, some CD-ROM devices may be deactivated. This deactivation occurs because device configuration information is being updated.


4. View the device number and volume name by typing
CD DEVICE LIST <Enter>
5. Mount the CD-ROM as a volume by typing
CD MOUNT [(Variable:)device number ] | [(Variable:)volume name ] <Enter>
Replace (Variable:)device number  with the device number or replace (Variable:)volume name  with the volume name of the CD-ROM disc.

For example, to mount the NetWare_41 CD-ROM, type
CD MOUNT NETWARE_41 <Enter>
Note: It may take several minutes to mount the volume the first time, depending on the size of the CD-ROM and the speed of your computer.


The standard volume mount messages appear.

6. To mount the CD-ROM as a NetWare volume each time the server comes up, edit the IOSTART.NCF file of the SFT III server where the HBA is installed. Add these commands to the IOSTART.NCF file:
LOAD [(Variable:)path ](Variable:)disk_driver 
7. Edit the MSAUTO.NCF file, adding these commands:
LOAD CDROM CD MOUNT [(Variable:)device number ] | [(Variable:)volume name ]
01/22/2002

Notifying Users of Disk or Server Failure

To notify a user or group of users about a disk or server failure, use the Server Failure Notification Name SET parameter in the MSEngine. Also put this parameter in the MSAUTO.NCF file. For example:

SET SERVER FAILURE NOTIFICATION NAME = ADMIN

When a failure happens, the specified user or group receives a broadcast message. To send this broadcast message to all logged-in users, use group EVERYONE.

01/22/2002

Recovering an Orphaned Partition

Use the following procedure to recover an orphaned partition:

01/22/2002

Procedure

1. Load INSTALL.

2. From the Installation Options menu, choose Disk Options.

3. From the Available Disk Options menu, choose Mirror/Unmirror.

The Partition Mirroring Status appears, showing one mirrored and one out of sync. The out-of-sync partition is the orphan.

4. Choose the mirrored partition.

The Mirrored Disk Partitions menu displays the partition number and device number of each mirrored set. The device number for the out-of-sync partition is an unavailable device.

5. Remove the partition from the set by highlighting the out-of-sync partition and pressing <Delete>.

6. Return to Disk Partition Mirroring Status by pressing <Esc>.

Not mirrored appears by the good partition, and Out of sync appears by the other partition. Note the number of the not mirrored partition.

7. Highlight the out-of-sync partition and press <F3>.

A warning message similar to the following appears:

Warning!! The selected partition contains Volume SYS Segment 0 and that volume is already defined.

8. Select the No Salvage option.

9. Select the good partition (as noted in ).

10. Press <Insert> for a list of the available partitions.

11. Highlight the previously orphaned partition and press <Enter>.

12. Continue by pressing <Esc>.

The unavailable partition is deleted. After a brief delay, the remirroring process begins. Because the entire partition is remirrored, the process takes several minutes or hours, depending on the partition size. You can check remirroring status with the MIRROR STATUS command or by watching the install screen.

01/22/2002
Network Clients

This section discusses solutions for workstation problems that may be encountered on NetWare 4 SFT III networks.

01/22/2002

Finding a Server from an Ethernet Workstation

Some Ethernet clients may have problems finding or attaching to a server.

If you use the 802.3 frame type, change the value of the Enable IPX Checksums SET parameter to 0 in the MSEngine by typing the following:

SET Enable IPX Checksums = 0 <Enter>

The default setting is 0. Also, make sure volume SYS: is mounted.

01/22/2002

Finding the SFT III Server First

If another NetWare server is on the same network segment as an SFT III server, clients will not find the SFT III server first.

Because the IOEngine routes the get-nearest-server request to the MSEngine, the SFT III server appears to be two hops away while the other NetWare server appears to be closer.

To find the SFT III server first, include the following statement in the client's NET.CFG file:

Preferred Server=<(Variable:)MSEngine_name  >

01/22/2002

Handling DOS IPX Session Timeouts During Server Switchover

If DOS IPXTM clients are timing out or losing connections when the servers switch over, increase the IPX retry count parameter in the NET.CFG file on clients using IPX. For example:

ipx retry count 40

01/22/2002

Logging in ARCnet or Token Ring Clients Immediately After a Halt

Some clients on token-passing networks may have problems logging in after a server has been halted. Wait for the client to time out; then try to log in again.

01/22/2002

Responding to LAN Driver Loopback Error Detected Messages

This message indicates that two or more ARCnet boards have the same node address. Reconfigure each board to a unique address.

01/22/2002

Turning Off the IPX Checksum Option for Token Ring

Token ring doesn't support the enabling of IPX checksums. If you are on a token ring network, change the value of the Enable IPX Checksums SET parameter in the MSEngine to 0 by typing the following:

SET Enable IPX Checksums = 0 <Enter>

01/22/2002
Network Performance

This section discusses methods of optimizing network performance.

01/22/2002

Handling Frequent Ups and Downs of the IPX Internet

This situation indicates a faulty network connection or a problem with communication between the servers.

Use SET to increase the IPX Internet Down Wait Time and the MSL Deadlock Detect Wait Time parameters by 0.5 second or more.

SET IPX Internet Down Wait Time = (Variable:)variable 

SET MLS deadlock Wait Time = (Variable:)variable 

The value of (Variable:)variable  should be 0.5 second greater than the current value. If this solves the problem, put the SET commands in the IOSTART.NCF files. If it does not solve the problem, increase (Variable:)variable  again.

Also, check the network cabling attached to each server and the network connections (including routers and bridges) between the servers.

Make sure the interrupt priority of the MSL board is higher than the interrupt priority of the network boards in each NetWare server.

In most cases, this means setting the MSL interrupt to a lower number. (See the MSL board manufacturer's documentation for more information on setting interrupts.)

Note: Interrupt priorities from high to low are 0, 1, 2 or 9, A, B, C, D, E, F, 3, 4, 5, 6, 7, and 8. Interrupt 2 is the highest priority you can assign to an MSL board because interrupts 0 and 1 are reserved.


01/22/2002

Responding to IPX Network No Longer Returning Status Messages

This message indicates a problem with a network connection: either a faulty network board in one of the servers, a cabling problem in the network, or an incorrectly bound network protocol.

If this server is the primary server, and the system determines that the other server has more functional network boards, that server will become the primary.

To determine if a server's network board has failed, type the CONFIG command at each server's IOEngine prompt.

A not sending or not receiving message indicates a bad board or a faulty network connection.

If this message frequently recurs with the message, IPX Network is now returning status check packets, increase the IPX Internet Down Wait Time SET parameter to give the system a little more time before determining that a network board has failed. Use the following syntax:

SET IPX Internet Down Wait Time = (Variable:)variable 

01/22/2002

Handling Primary Network Board Failure

If a network board in the primary server stops transmitting or receiving packets, the secondary server assumes the primary role only if the following conditions are true:

l The secondary server's network boards are more functional than the primary's network boards.

l The servers are fully synchronized and their disks are completely mirrored.

Three SET parameters in the IOEngine can help detect and prevent downtime caused by network board failure: Check LAN Option, Check LAN Extra Wait Time, and Use Diagnostic Responder to Validate LAN Functionality.

01/22/2002

SET Check LAN Option=2

This setting (2 is the default) forces a server switchover by restarting the primary server if a network board fails in the primary.

01/22/2002

SET Check LAN Extra Wait Time=10

This setting adds 10 extra seconds to the time the system waits before forcing a switchover because of a bad network board in the primary server.

By default this setting is 0, but if you have a large network or heavy traffic on the network, you may want to increase the wait time to prevent a premature server switchover.

01/22/2002

SET Use Diagnostic Responder to Validate LAN Functionality=ON

This setting broadcasts an IPX diagnostic request to verify that a network board is functional.

By default this setting is OFF because the diagnostic request adds traffic overhead and can hurt performance of large networks.

However, if you want to know whether a network board is bad or just slow, set this parameter to ON.

01/22/2002
Troubleshooting

This section suggests solutions to various problems that can occur with NetWare 4 SFT III networks.

01/22/2002

Both Servers Are Primary; No Secondary Console Display

ACTIVATE SERVER was executed from both IOEngines, and both servers have assumed the primary server role. Delete ACTIVATE SERVER from the IOSTART.NCF file. Restart one server.

Make sure the node addresses on the network boards are unique to each server.

01/22/2002

Disks Won't Mirror After Servers Synchronize

Verify that the correct disk driver is loaded in each server's IOEngine. Also, put the load command for the disk driver in each server's IOSTART.NCF file.

Check the mirror status in the INSTALL.NLM for orphaned drives.

01/22/2002

Error IPX internet may be too slow Message Appears

A busy IPX network or busy routers between SFT III servers can cause this message:

IPX internet may be too slow to notify secondary server if MSL fails... increase secondary take over delay amount.

If I'm alive packets take too long to travel between the servers over IPX, the secondary server may prematurely take over for the primary server.

To prevent this, use the SET command in the IOEngine to increase the Secondary Take Over Wait Time. Use the following syntax.

SET Secondary Take Over Wait Time = (Variable:)variable 

Also check for problems with network connection that may be slowing IPX.

01/22/2002

Error transferring IOEngine error log to MSEngine Message Appears

Volume SYS: may not have mounted. If the volume is mounted, check the file attributes of the IO$LOG.ERR file to verify the file isn't flagged as Read Only.

If the problem persists, there may not be room for the file on volume SYS:. Delete or rename the IO$LOG.ERR file on the boot partition.

01/22/2002

Inactive device associated with mirror partition Message Appears

This message is caused by one of the following situations:

l The MSEngine was brought down, but only one server was brought back up.

l The disk driver on one server didn't load correctly.

l A hard disk or controller failed.

Correct any hardware problems associated with the failure and bring both servers back up.

01/22/2002

IOEngine Network Number Prompt Appears After Executing IOSTART.NCF

Check both IOSTART.NCF files to make sure they assign unique internal network numbers to each IOEngine.

Check the spelling and syntax of the IPX internal network number command. There should not be an equal sign (=) in the command.

Make sure the IOSTART.NCF file is in the same directory as the MSERVER.EXE file on each server.

01/22/2002

Keyboard Is Slow or Frozen After Loading ARCnet on IRQ 2

Make sure there are no I/O port or memory address conflicts.

01/22/2002

LOGIN Fails, Even If Correct Password Is Given

If checksumming is disabled at the workstation, it must also be disabled at the server. Change the value of the Enable IPX Checksums SET parameter in the MSEngine to 0, using the following syntax:

SET Enable IPX Checksums = 0

This may also indicate a problem with the user's Directory context.

01/22/2002

LOGIN Fails for an NLM in the IOEngine

This happens only in the case of a backup NLM that logs in to the server using a different user name and would happen only on the secondary IOEngine. Type the following.

SET Reply To Get Nearest Server = ON <Enter>

01/22/2002

MSEngine Name Prompt Appears Even Though MSAUTO.NCF Exists

Volume SYS: did not mount, or the command was typed incorrectly in the MSAUTO.NCF file.

There should not be an equal sign (=) in the MSEngine name command.

Type in the MSEngine name and IPX internal network number. Edit the MSAUTO.NCF file if necessary.

Make sure the disk drivers are loaded in both IOEngines. Mount volume SYS: from the command line or with INSTALL. If volume SYS: is unable to mount, run VREPAIR and try mounting it again.

01/22/2002

MSL Drivers Are Loaded But Servers Do Not Synchronize

You might have installed the MSL boards incorrectly. Check the following on each server:

l Interrupt settings on the MSL boards (for conflicts with the network boards)

l MSL cables and board connections

l MSL drivers (both servers must have the same version of the driver)

01/22/2002

MSL Isn't Activated

Verify that the same MSL driver is loaded on each server and that the MSL boards and cables are installed correctly.

01/22/2002

MSL Times Out When a Device Driver is Loaded

Load the device driver before loading the MSL driver, or use the SET utility in the IOEngine to increase the MSL Error Wait Time value. Use the following syntax:

SET MSL Error Wait Time = (Variable:)variable 

If this solves the problem, put the SET command shown above in the IOSTART.NCF file, above the LOAD command for the MSL driver.

01/22/2002

Other Server Requested This Server to Halt Message Appears

The following sequence of events causes this message to appear:

1. The secondary server stops receiving I'm alive packets from the primary server over the IPX network.

2. The MSL between the two servers fails.

3. The secondary server begins to take over as primary (because of no I'm alive packets).

4. The secondary server then receives an I'm alive packet from the primary server.

5. Because two primary servers can't coexist, the secondary server halts.

Check for hardware problems on the IPX network and the MSL connection. If there are no hardware problems, use SET in the IOEngine to increase the Secondary Take Over Wait Time, using the following syntax:

SET Secondary Take Over Wait Time = (Variable:)variable 

01/22/2002

Primary Server Doesn't Know IPX Route to Secondary Server

Verify that

l Both IOEngines are active

l The LAN drivers are loaded in both IOEngines

l The protocols are bound to the network boards in both servers

Use SET to increase the IPX Internet Down Wait Time and the MSL Deadlock Detect Wait Time parameters, using the following syntax:

SET IPX Internet Down Wait Time = (Variable:)variable 

SET MSL Deadlock Detect Wait Time = (Variable:)variable 

Check the network cabling and the network boards and routers between the SFT III servers to verify they are functioning properly.

01/22/2002

Queue Overrun Abend Occurs

Decrease the number of seconds in the IPX Internet Down Wait Time and the MSL Deadlock Wait Time SET parameters in the IOEngine, using the following syntax:

SET IPX Internet Down Wait Time = (Variable:)variable 

SET MSL Deadlock Detect Wait Time = (Variable:)variable 

01/22/2002

Secondary Doesn't Switch Over When Primary LAN Driver Is Unloaded

The secondary server is designed to take over for the primary server if the secondary server detects a network board failure in the primary.

However, an unloaded LAN driver is not considered a hardware failure, since it was explicitly requested by a user at the server console.

Therefore, if a LAN driver in the primary server is unloaded, both the primary and the secondary server will report they aren't receiving I'm alive packets, but the secondary server will not switch over.

When the LAN driver is reloaded, the primary server will continue to function as primary.

Warning: Any activity in progress when the LAN driver is unloaded from the primary server will be suspended, and client connections will time out unless the driver is promptly reloaded.


01/22/2002

Server Failure Occurs

Because the MSEngines on both SFT III servers are mirrored and the IOEngines are not, you can assume the following about most server failures:

l If both servers fail or have an abend condition, the failure is probably related to software running in the MSEngine.

l If one server fails and the remaining server assumes the role of primary server, the failure is probably related to hardware, or to a driver or NLM loaded in the failed server's IOEngine.

01/22/2002

Servers Restart for No Apparent Reason

You may have the wrong values for the recovery option SET parameters, or server test mode is causing constant switchovers. Check the IO$LOG.ERR and MSSTATUS.DMP files for the cause of the problem.

Use these SET parameter values in both IOEngines to halt the server (without restarting it) so you can see the error:

Test Mode=0

Secondary Server MSL Send Blocked Recovery Option=0

Primary Server MSL Send Blocked Recovery Option=0

MSEngine Abend And Processor Exception Recovery Option=0

IOEngine Abend And Processor Exception Recovery Option=0

Machine Check Recovery Option=0

Memory Parity Error Recovery Option=0

Secondary Server MSL Hardware Failure Recovery Option=0

Primary Server MSL Hardware Failure Recovery Option=0

MSEngine Outputs Different Recovery Option=1

Secondary Server MSL Consistency Error Recovery Option=0

Primary Server MSL Consistency Error Recovery Option=0

Secondary Server MSL Deadlock Recovery Option=0

Primary Server MSL Deadlock Recovery Option=0

01/22/2002

SFT III Error Log Files

When a failure occurs, SFT III updates three error log files in the SYS:SYSTEM directory:

l IO$LOG.ERR records the activity of both IOEngines.

l SYS$LOG.ERR records the MSEngine activity.

l MSSTATUS.DMP records status dumps of engine states, synchronization and communications states, IOEngine to MSEngine requests, and other information following a failure or server switchover.

Use these error log files to track the events that occurred prior to a failure or following a switchover.

Note: The IO$LOG.ERR file on the failed server is written to its boot partition until the servers come back up. Then, the IO$LOG.ERR file from the boot partition is appended to the IO$LOG.ERR file on volume SYS:.


01/22/2002

Should This Machine Become the Primary Server? Message Appears

The message above appears on the secondary server's console preceded by

All communication channels with the primary server have failed. Since the IPX network communication channel failed before the mirrored server link failed, the secondary is unable to determine if the primary server is still active.

Verify that the primary server has failed, and then type Y. If the primary server is still active, type N.

01/22/2002

Test Mode Is No Longer Working

You can set the Test Mode parameter from the command line. For example

SET Test Mode = (Variable:)variable 

However, Test Mode resets to the default (no test) the first time that server is automatically downed and rebooted.

To keep the server in test mode, put the SET command line shown above, using the appropriate parameter, in the IOSTART.NCF file of both SFT III servers.

If your drives are not mirrored, the primary server will not initiate test mode. In this case use INSTALL to remirror the drives.

01/22/2002

Unknown Command Message Appears

Use the <Alt>+<Esc> keys to toggle to the other engine's console; then execute the command again.

Use the LOAD command to execute modules (such as INSTALL and MONITOR).

01/22/2002

Unknown SET Parameter Name Message Appears

Use the <Alt>+<Esc> keys to toggle to the other engine's console; then retype the SET command.

Check the spelling and syntax of the SET parameter and retype it correctly.

01/22/2002

Volume SYS: Does Not Mount

Load VREPAIR. If VREPAIR does not load, halt the primary server by typing

HALT <Enter>

Execute MSERVER with the -ns parameter. Then reload the disk driver and load VREPAIR.

Warning: When VREPAIR is loaded, Option 2 must be set to Write all directory and FAT entries out to disk and Option 3 must be set to Write changes immediately to disk.


Make sure you are using the latest disk driver.

01/22/2002

Workstations Cannot Find or Attach to a Server

Make sure that

l Volume SYS: is mounted

l The appropriate network drivers are loaded on each SFT III server

l Novell® Directory ServicesTM (DS.NLM) is loaded

l The LAN protocols are bound to the network boards

l The workstation and server are using the same frame type

01/22/2002

Workstations Time Out During Server Resynchronization

Change the IPX retry count in the NET.CFG file on the workstations to a higher value. For example:

ipx retry count 40

Turn on the SET parameter Notify All Users Of Mirrored Server Synchronization in the MSEngine to broadcast a message to all users when synchronization occurs. Use the following syntax.

SET Notify All Users Of Mirrored Server Synchronization = (Variable:)<group name> 






Marvin Huffaker MCNE, CNE
Marvin Huffaker Consulting
 
Status
Not open for further replies.

Part and Inventory Search

Sponsor

Back
Top