This chapter describes how to maintain an Available Server or Production Server hardware configuration. It discusses the following topics:
Preparing to change the hardware configuration (Section 7.1)
Stopping available server environment (ASE) activity (Section 7.2)
Shutting down and starting up the cluster in a Production Server environment (Section 7.3)
Maintaining member systems (Section 7.4)
Adding and removing storage shelves (Section 7.5)
Maintaining disks in an ASE (Section 7.6)
Adding and removing shared buses (Section 7.7)
Disconnecting and connecting SCSI signal converters (Section 7.8)
Maintaining MEMORY CHANNEL interconnects (Section 7.9)
If you want to change your hardware configuration while maintaining available server environment (ASE) operation, make sure that any shared SCSI buses remain terminated. If you use trilink connectors and Y cables to connect devices to the shared SCSI buses, you can disconnect the devices without affecting the bus termination. In addition, if you connect an extra trilink connector or Y cable to a shared bus, you can attach a device to it and expand your configuration without affecting the bus termination. See Chapter 3 for information about maintaining bus termination.
If you are unable to maintain a terminated shared bus, you must shut down the cluster and then change the hardware configuration. Section 7.2 describes how to shut down the cluster.
For the TruCluster Production Server Software product, because the MEMORY CHANNEL is the cluster interconnect, using redundant MEMORY CHANNEL interconnects and MEMORY CHANNEL hubs allow you to easily change your configuration without shutting down the cluster.
Some maintenance tasks require you to use the
asemgr
utility.
See
asemgr(8)
and the TruCluster Software Products
Administration
manual for information about the utility.
If you cannot isolate a device and maintain a terminated shared bus, you must stop all available server environment (ASE) activity before you can perform maintenance on that device.
Before you stop ASE activity, if you have not already done so, use the
asemgr
utility to obtain information about each of your
ASE services.
You should obtain information such as:
Service name
Placement policy
Storage configuration information:
Exports list
Mount table
AdvFS configuration
LSM configuration
To stop all ASE activity, follow these steps:
Use the
asemgr
utility to put each ASE
service off line.
This stops the services.
Invoke the
/sbin/init.d/asemember stop
command on all the member systems to stop the ASE daemons.
After you stop ASE activity, you can perform the desired maintenance.
To restart ASE activity, follow these steps:
Invoke the
/sbin/init.d/asemember start
command on all the member systems to restart the ASE daemons.
Use the
asemgr
utility to put the ASE services
on line.
To shut down all activity in the cluster, stop the cluster daemons and then stop all shared bus activity.
To stop the cluster daemons, enter the following command on each member system:
# /sbin/init.d/clumember stop
To start the cluster daemons, enter the following command on each member system:
# /sbin/init.d/clumember start
To stop and then start the cluster daemons, enter the following command on each member system:
# /sbin/init.d/clumember restart
To stop or restart ASE activity on a shared SCSI bus in the Production Server environment, follow the steps in Section 7.2.
Occasionally, a member system will need maintenance. For example, you may need to disconnect a member system from the shared SCSI buses to install new hardware. You may want to replace a member system with a newer model or add a member system to your configuration.
Depending on how you set up the shared SCSI buses, you may be able to perform system maintenance without shutting down the available server environment (ASE). The following sections describe how to perform some common system maintenance tasks.
To shut down a member system, use the
asemgr
utility to delete the member system from the ASE.
This causes
any ASE services running on the member system to relocate to another member
system.
You can also manually relocate the services running on the member
system, and then shut down the system in the usual way.
Note
You cannot delete a member if it is included in the list of members that are favored to run the service, according to the service's Automatic Service Placement (ASP) policy. See the TruCluster Software Products Administration manual for information about deleting members and ASP policies.
If the system is connected to a SCSI signal converter, you must first turn off the signal converter that is connected to the system and then turn off the system.
To turn on a system that is connected to a SCSI signal converter, you
must turn on the system and allow it to complete its startup diagnostics before
you turn on the signal converter.
Then, invoke the
asemgr
utility on a member system to add the system to the ASE.
To add a member system to your configuration, you must install a SCSI bus adapter for each shared SCSI bus the system will be attached to. Then, you must connect the system to all the shared SCSI buses in its ASE (see Section 7.7). Depending on your hardware configuration, you may be able to add a system without shutting down the cluster.
If you have an extra trilink connector or Y cable already connected to all the shared SCSI buses, you can add a member system to your hardware configuration without shutting down. In this case, you can connect the member system to the shared buses without affecting the bus termination and cluster operation. Otherwise, you must shut down the cluster as described in Section 7.2 and Section 7.3.
Additionally, for a Production Server configuration, you may have to install one or two MEMORY CHANNEL adapters. You must shut down the cluster to add a member system in the following cases:
You are changing from virtual hub mode to standard mode and you must add a MEMORY CHANNEL hub to your configuration. You must shut down the cluster because you must change the jumpers in the MEMORY CHANNEL adapters of all current systems.
You have a single MEMORY CHANNEL interconnect that includes a hub, but there is no line card available for the new system. You must shut down the cluster because you must power down a hub in order to install the line card.
See Section 7.9 for more information on adding MEMORY CHANNEL interconnects.
When you remove a system from an ASE, you do not have to shut down all ASE activity if disconnecting the system from the shared bus does not cause the bus to be unterminated. You can delete the member system from the ASE, shut down and turn off the system, as described in Section 7.2 and Section 7.3, and then disconnect the system from the shared bus.
If you used trilink connectors or Y cables to connect the system to the shared SCSI buses, you can remove a system as follows:
If you will replace the system, use the
asemgr
utility to relocate any ASE services running on the member system.
If you
will not replace the system, use the
asemgr
utility to
delete the system from the ASE.
Disconnect the system from the shared SCSI buses and the cluster interconnects.
Sometimes you must disconnect a member system to perform maintenance. If the system can be isolated from the shared bus without affecting the bus termination, you can perform the maintenance and the availability of the ASE services is not affected.
If the member system cannot be isolated from the shared bus without affecting the bus termination, you must shut down all ASE activity to perform the maintenance, as described in Section 7.2. Your ASE services are unavailable while you perform the maintenance.
If you can isolate the member system from the shared bus, you can perform hardware maintenance on the system's CPU as follows:
Use the
asemgr
utility to relocate the
services running on the member system.
Delete the member system from the ASE by using the
asemgr
utility.
Note
You cannot delete a member if it is included in the list of members that are favored to run the service, according to the service's Automatic Service Placement (ASP) policy. See the TruCluster Software Products Administration manual for information on ASP policies.
Shut down the system.
Disconnect the member from the shared bus. Make sure that the bus is still terminated so that it functions correctly.
Perform the CPU maintenance.
Connect the member to the shared bus.
Turn on the system.
Add the member system to the ASE with the
asemgr
utility.
To add a network interface to a member system in an existing ASE, follow these steps:
Delete the member system from the ASE.
Note
You cannot delete a member if it is included in the list of members that are favored to run the service, according to the service's Automatic Service Placement (ASP) policy. See the TruCluster Software Products Administration manual for information about deleting members and ASP policies.
Turn off the system.
Install the network interface.
Turn on and reboot the system.
Configure the new network interface.
Run the
asemgr
utility on an existing member
system and add the system to the ASE.
Run the
asemgr
utility on the system to
specify the new network interface.
To remove a network interface from a member system, follow these steps:
Run the
asemgr
utility on the member system
to delete the network interface.
Delete the member system from the ASE.
Turn off the system.
Remove the network interface.
Turn on and reboot the system.
Run the
asemgr
utility on an existing member
system and add the system to the ASE.
If you want to connect another storage shelf to a shared bus without shutting down the ASE, you must have an extra trilink connector or Y cable already connected to the shared SCSI bus. If your configuration meets this requirement, you can connect the storage shelf to the shared bus without affecting the bus termination and cluster operation. Otherwise, you must shut down the ASE as described in Section 7.2.
You can disconnect a storage shelf from a shared SCSI bus without shutting down the ASE, if you used a trilink connector or Y cable to connect the shelf to the bus.
In addition, if you disconnect a storage shelf from a shared SCSI bus (without affecting the bus termination) or remove a disk from a slot, any service that uses the disks is stopped, unless the disks are part of a mirrored Logical Storage Manager (LSM) volume or are contained in a RAID set.
If you want to connect or disconnect a storage shelf with a single-ended SCSI interface, see Section 7.8 for information about connecting and disconnecting SCSI signal converters.
Most basic system management tasks on the shared disks in the available server environemnt (ASE) are the same as in a noncluster environment. However, you must be careful when performing maintenance on any disk on a shared SCSI bus because of the constant activity on the bus. To perform some types of maintenance, such as upgrading disk firmware, you must either isolate the device from the shared bus or shut down the cluster.
The following sections describe how to maintain the disks in the cluster.
If you want to set a disk that is used in a service off line, you must ensure that a running service is not using the disk, unless the disk is part of a Logical Storage Manager (LSM) mirrored volume or a mirrored RAID device.
If a disk is being used by a service, you can temporarily stop the
service by using the
asemgr
utility's interactive
facility or command-line interface to set the service off line.
For
example, you can use the following command syntax:
asemgr -x
[service]
After you set the service off line, use the
scu
utility to set the disk off line.
Setting a disk off line spins
down the disk and allows you to remove it from the storage shelf.
For
example, to set the
/dev/rz28c
disk off line, enter the
following command:
#scu -f /dev/rrz28c stop
After you perform the maintenance on a disk, you can set the disk on line. For example:
# scu -f /dev/rrz28c start
After you set the disk on line, you can use the
asemgr
utility's interactive facility or command-line interface to set the service
that uses the disk on line.
For example, you can use the following command
syntax:
asemgr -s
[service]
When you add a disk to your hardware configuration, you install it in the storage shelf. The disk must have a unique SCSI ID. In addition, you may have to update the system configuration files to ensure that the systems recognize the new disk. See Section 4.3.1.2 for information about recognizing shared disks in the cluster.
When you remove a disk from a storage shelf, you must ensure that a running service is not using the disk, unless the disk is part of a LSM mirrored logical volume or a mirrored RAID device.
If a disk that you want to remove is being used by a service and the
disk will be replaced, you can temporarily stop the service by using the
asemgr
utility to put the service off line.
You can then replace
the disk and use the
asemgr
utility to put the service
that uses the disk on line.
You may have to back up the disk before you remove
it and then restore the information to the new disk.
If a disk that you want to remove is being used by a service and the
disk will not be replaced, use the
asemgr
utility to modify
the service and remove the disk from the service.
You can then remove the
disk from the storage shelf.
To physically remove a disk from a storage shelf, partially pull out the disk from its slot (about 3 to 5 centimeters), wait for the disk to spin down, then completely remove the disk from the slot.
Caution
If you remove the disk from the storage shelf without waiting enough time to allow the disk to spin down, the torque induced by the gyroscopic effect may cause you to drop the disk.
Disks that are local to the member (that is, internal disks not shared disks) are not affected by Available Server and can be backed up and restored with the usual methods. Disks that are on the ASE shared bus need special consideration. You do not have to shut down your system to single-user mode to perform safe backups.
There are three ways to back up a disk used in an ASE:
Use the
asemgr
utility to relocate the
service to a specific member so the service will not move.
You must back
up or restore the disks from this member.
Use the
asemgr
utility to put the service
that uses the disks off line, stopping the service.
Back up the disks from
any member system.
Advanced File System (AdvFS) or LSM disks must be configured
on the system from which you are performing the backup.
Use POLYCENTER NetWorker Save and Restore to back up the disks in the ASE services. See the TruCluster Software Products Administration manual for information about ASE services. See the NetWorker Version 3.2 documentation for information about using NetWorker to back up an ASE service's storage.
For UNIX file systems, back up the disk using the
dump
command and the raw device file
/dev/rrznn.
Use the
restore
command to
restore a disk.
For AdvFS filesets, from the member that is
running the service, you can use the
clonefset
command
to clone a fileset, and then use the
vdump
and
vrestore
commands to back up and restore the cloned fileset.
The way failed disks are handled in an ASE depends on whether you are using LSM or RAID. If a failure occurs in a disk that is not part of an LSM or RAID mirrored volume, the service stops. After the disk has been replaced and any data restored, you can restart the service.
If a disk that is used in an LSM volume fails, see the DIGITAL UNIX Logical Storage Manager manual for information about replacing failed LSM disks.
If a failure occurs in a disk that is part of an LSM or RAID mirrored volume, you can replace the disk while the service is running. If a disk that is mirrored with RAID fails, see the RAID documentation for information about how to handle this situation.
After a failed or previously unavailable part of an LSM mirrored volume
becomes available again, you can reincorporate the device into the service
by resynchronizing the mirrored volume outside of the cluster and then rereserving
the devices.
You rereserve devices by using the
asemgr
utility and choosing the Advanced Utilities menu item.
This method will not
interrupt the service.
See the TruCluster Software Products Administration manual for more information on handling disk failures with AdvFS and LSM.
If you want to add a shared SCSI bus to your hardware configuration, you must shut down the cluster, as described in Section 7.2, and prepare the systems and storage shelves for the new shared bus connection.
You can remove a shared SCSI bus without shutting down the cluster if you used trilink connectors or Y cables to connect the member systems and storage shelves to the shared SCSI bus. If your configuration meets this requirement, you can disconnect all the devices from the bus.
If you are using a storage shelf with a single-ended SCSI interface in your hardware configuration, it must be connected to a SCSI signal converter.
If you want to disconnect a SCSI signal converter (and the single-ended storage shelf) from a shared bus, you must turn off the SCSI signal converter before disconnecting the cables. To reconnect it to the shared bus, connect the cables before turning on the SCSI signal converter.
Use the power switch to turn off a standalone SCSI signal converter (DWZZA-AA or DWZZB-AA). To turn off a StorageWorks building block (SBB) SCSI signal converter (DWZZA-VA or DWZZB-VW), pull it from its disk slot.
The following sections contains information about maintaining MEMORY CHANNEL interconnects. See the MEMORY CHANNEL User's Guide for detailed information about maintaining the MEMORY CHANNEL hardware.
If you are adding a new system to your configuration, see Section 7.4.2 for information about connecting the system to the MEMORY CHANNEL interconnects.
Some MEMORY CHANNEL interconnect maintenance tasks require you to determine which interconnect is the primary (active) interconnect and which interconnect is the secondary (inactive) interconnect. Examine the startup message for the last boot and note which peripheral component interconnect (PCI) slot contains the adapter for the primary interconnect.
The following is an example of the startup messages:
mchan0: Module revision = 11 mchan0: jumpered as HUB configuration mchan0 at pci0 slot 7 mchan1: Module revision = 11 mchan1: jumpered as HUB configuration mchan1 at pci0 slot 8
If you want to change from a single MEMORY CHANNEL interconnect to redundant MEMORY CHANNEL interconnects without shutting down the cluster, follow these steps to add an interconnect:
Use the
asemgr
utility to relocate the
available server environment (ASE) services running on a member system.
If
you are using virtual hub mode (you are not using a MEMORY CHANNEL hub), do this
on the VH0 system.
Shut down the system.
Install another MEMORY CHANNEL adapter in the system, setting the jumpers for the mode of the existing interconnect, in either standard mode (if you are using a hub) or virtual hub mode.
Connect a MEMORY CHANNEL link cable to the newly installed MEMORY CHANNEL adapter.
If you are using a MEMORY CHANNEL hub, connect the link cable to the line card which occupies the same hub slot position as the line card to which the first adapter is connected. See Chapter 5 for information about connecting adapters to line cards in hubs.
Turn on the system.
Repeat these steps for each system. If you are using a MEMORY CHANNEL hub, turn on the hub after the last system is connected. If you are not using a MEMORY CHANNEL hub, connect the adapter in the first system to the adapter in the second system with the link cable.
To change from redundant MEMORY CHANNEL interconnects to a single MEMORY CHANNEL interconnect, you must remove an interconnect. To do this, follow these steps:
Turn off the hub on the interconnect that you want to remove.
Use the
asemgr
utility to relocate the
ASE services running on a member system.
If you are using virtual hub mode
(you are not using a MEMORY CHANNEL hub), do this on the VH0 system.
Shut down the system.
Deinstall the MEMORY CHANNEL adapter that is connected to the interconnect you want to remove.
Reboot the system.
Perform steps 2 through 5 on all the systems.
If you want to change from virtual hub mode, which does not require a MEMORY CHANNEL hub, to standard mode, you must add a hub to your configuration. To do this, you must shut down the cluster because you must change the jumpers on the MEMORY CHANNEL adapters.
See the MEMORY CHANNEL User's Guide for information about adapter jumpers.
If you need to replace a MEMORY CHANNEL hub and you have only one MEMORY CHANNEL interconnect, you must shut down the cluster as described in Section 7.2.
If you have redundant MEMORY CHANNEL interconnects, you can turn off the hub, replace the hub, and then reconnect the cables as described in Chapter 5. Then, you must reboot the member systems, one at a time.
If a system is connected to a line card that fails and you have an extra line card available in each MEMORY CHANNEL hub, you can connect a MEMORY CHANNEL adapter to a line card without shutting down the cluster. However, you must reboot the system after you connect the system to the line card.
If you have redundant interconnects, make sure that you connect the system to line cards that are in the same slot position in the hubs.
If you have a single MEMORY CHANNEL interconnect and you disconnect a MEMORY CHANNEL link cable, the member system that was connected to the link cable will crash. You must then reconnect the link cable and reboot the system.
If you have redundant interconnects and you disconnect a MEMORY CHANNEL link cable that is part of the secondary interconnect (the inactive interconnect), you can reconnect the cable.
However, if you have redundant interconnects and you disconnect a MEMORY CHANNEL link cable that is part of the primary interconnect (the active interconnect), you must reconnect the cable and then reboot the member system.