[Contents] [Prev. Chapter] [Next Section] [Next Chapter] [Index] [Help]

7    Maintaining the Hardware Configuration

This chapter describes how to maintain an Available Server or Production Server hardware configuration. It discusses the following topics:


[Contents] [Prev. Chapter] [Next Section] [Next Chapter] [Index] [Help]

7.1    Preparing to Change the Hardware Configuration

If you want to change your hardware configuration while maintaining available server environment (ASE) operation, make sure that any shared SCSI buses remain terminated. If you use trilink connectors and Y cables to connect devices to the shared SCSI buses, you can disconnect the devices without affecting the bus termination. In addition, if you connect an extra trilink connector or Y cable to a shared bus, you can attach a device to it and expand your configuration without affecting the bus termination. See Chapter 3 for information about maintaining bus termination.

If you are unable to maintain a terminated shared bus, you must shut down the cluster and then change the hardware configuration. Section 7.2 describes how to shut down the cluster.

For the TruCluster Production Server Software product, because the MEMORY CHANNEL is the cluster interconnect, using redundant MEMORY CHANNEL interconnects and MEMORY CHANNEL hubs allow you to easily change your configuration without shutting down the cluster.

Some maintenance tasks require you to use the asemgr utility. See asemgr(8) and the TruCluster Software Products Administration manual for information about the utility.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.2    Stopping ASE Activity

If you cannot isolate a device and maintain a terminated shared bus, you must stop all available server environment (ASE) activity before you can perform maintenance on that device.

Before you stop ASE activity, if you have not already done so, use the asemgr utility to obtain information about each of your ASE services. You should obtain information such as:

To stop all ASE activity, follow these steps:

  1. Use the asemgr utility to put each ASE service off line. This stops the services.

  2. Invoke the /sbin/init.d/asemember stop command on all the member systems to stop the ASE daemons.

After you stop ASE activity, you can perform the desired maintenance.

To restart ASE activity, follow these steps:

  1. Invoke the /sbin/init.d/asemember start command on all the member systems to restart the ASE daemons.

  2. Use the asemgr utility to put the ASE services on line.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.3    Shutting Down and Starting Up a Production Server Cluster

To shut down all activity in the cluster, stop the cluster daemons and then stop all shared bus activity.

To stop the cluster daemons, enter the following command on each member system:

# /sbin/init.d/clumember stop 

To start the cluster daemons, enter the following command on each member system:

# /sbin/init.d/clumember start 

To stop and then start the cluster daemons, enter the following command on each member system:

# /sbin/init.d/clumember restart 

To stop or restart ASE activity on a shared SCSI bus in the Production Server environment, follow the steps in Section 7.2.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.4    Maintaining Member Systems

Occasionally, a member system will need maintenance. For example, you may need to disconnect a member system from the shared SCSI buses to install new hardware. You may want to replace a member system with a newer model or add a member system to your configuration.

Depending on how you set up the shared SCSI buses, you may be able to perform system maintenance without shutting down the available server environment (ASE). The following sections describe how to perform some common system maintenance tasks.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.4.1    Shutting Down a Member System

To shut down a member system, use the asemgr utility to delete the member system from the ASE. This causes any ASE services running on the member system to relocate to another member system. You can also manually relocate the services running on the member system, and then shut down the system in the usual way.

Note

You cannot delete a member if it is included in the list of members that are favored to run the service, according to the service's Automatic Service Placement (ASP) policy. See the TruCluster Software Products Administration manual for information about deleting members and ASP policies.

If the system is connected to a SCSI signal converter, you must first turn off the signal converter that is connected to the system and then turn off the system.

To turn on a system that is connected to a SCSI signal converter, you must turn on the system and allow it to complete its startup diagnostics before you turn on the signal converter. Then, invoke the asemgr utility on a member system to add the system to the ASE.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.4.2    Adding a Member System to the Configuration

To add a member system to your configuration, you must install a SCSI bus adapter for each shared SCSI bus the system will be attached to. Then, you must connect the system to all the shared SCSI buses in its ASE (see Section 7.7). Depending on your hardware configuration, you may be able to add a system without shutting down the cluster.

If you have an extra trilink connector or Y cable already connected to all the shared SCSI buses, you can add a member system to your hardware configuration without shutting down. In this case, you can connect the member system to the shared buses without affecting the bus termination and cluster operation. Otherwise, you must shut down the cluster as described in Section 7.2 and Section 7.3.

Additionally, for a Production Server configuration, you may have to install one or two MEMORY CHANNEL adapters. You must shut down the cluster to add a member system in the following cases:

See Section 7.9 for more information on adding MEMORY CHANNEL interconnects.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.4.3    Removing a Member System from the Configuration

When you remove a system from an ASE, you do not have to shut down all ASE activity if disconnecting the system from the shared bus does not cause the bus to be unterminated. You can delete the member system from the ASE, shut down and turn off the system, as described in Section 7.2 and Section 7.3, and then disconnect the system from the shared bus.

If you used trilink connectors or Y cables to connect the system to the shared SCSI buses, you can remove a system as follows:

  1. If you will replace the system, use the asemgr utility to relocate any ASE services running on the member system. If you will not replace the system, use the asemgr utility to delete the system from the ASE.

  2. Disconnect the system from the shared SCSI buses and the cluster interconnects.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.4.4    Performing CPU Maintenance

Sometimes you must disconnect a member system to perform maintenance. If the system can be isolated from the shared bus without affecting the bus termination, you can perform the maintenance and the availability of the ASE services is not affected.

If the member system cannot be isolated from the shared bus without affecting the bus termination, you must shut down all ASE activity to perform the maintenance, as described in Section 7.2. Your ASE services are unavailable while you perform the maintenance.

If you can isolate the member system from the shared bus, you can perform hardware maintenance on the system's CPU as follows:

  1. Use the asemgr utility to relocate the services running on the member system.

  2. Delete the member system from the ASE by using the asemgr utility.

    Note

    You cannot delete a member if it is included in the list of members that are favored to run the service, according to the service's Automatic Service Placement (ASP) policy. See the TruCluster Software Products Administration manual for information on ASP policies.

  3. Shut down the system.

  4. Disconnect the member from the shared bus. Make sure that the bus is still terminated so that it functions correctly.

  5. Perform the CPU maintenance.

  6. Connect the member to the shared bus.

  7. Turn on the system.

  8. Add the member system to the ASE with the asemgr utility.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.4.5    Adding and Removing Network Interfaces

To add a network interface to a member system in an existing ASE, follow these steps:

  1. Delete the member system from the ASE.

    Note

    You cannot delete a member if it is included in the list of members that are favored to run the service, according to the service's Automatic Service Placement (ASP) policy. See the TruCluster Software Products Administration manual for information about deleting members and ASP policies.

  2. Turn off the system.

  3. Install the network interface.

  4. Turn on and reboot the system.

  5. Configure the new network interface.

  6. Run the asemgr utility on an existing member system and add the system to the ASE.

  7. Run the asemgr utility on the system to specify the new network interface.

To remove a network interface from a member system, follow these steps:

  1. Run the asemgr utility on the member system to delete the network interface.

  2. Delete the member system from the ASE.

  3. Turn off the system.

  4. Remove the network interface.

  5. Turn on and reboot the system.

  6. Run the asemgr utility on an existing member system and add the system to the ASE.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.5    Adding and Removing Shared Storage Shelves

If you want to connect another storage shelf to a shared bus without shutting down the ASE, you must have an extra trilink connector or Y cable already connected to the shared SCSI bus. If your configuration meets this requirement, you can connect the storage shelf to the shared bus without affecting the bus termination and cluster operation. Otherwise, you must shut down the ASE as described in Section 7.2.

You can disconnect a storage shelf from a shared SCSI bus without shutting down the ASE, if you used a trilink connector or Y cable to connect the shelf to the bus.

In addition, if you disconnect a storage shelf from a shared SCSI bus (without affecting the bus termination) or remove a disk from a slot, any service that uses the disks is stopped, unless the disks are part of a mirrored Logical Storage Manager (LSM) volume or are contained in a RAID set.

If you want to connect or disconnect a storage shelf with a single-ended SCSI interface, see Section 7.8 for information about connecting and disconnecting SCSI signal converters.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.6    Maintaining Disks in an ASE

Most basic system management tasks on the shared disks in the available server environemnt (ASE) are the same as in a noncluster environment. However, you must be careful when performing maintenance on any disk on a shared SCSI bus because of the constant activity on the bus. To perform some types of maintenance, such as upgrading disk firmware, you must either isolate the device from the shared bus or shut down the cluster.

The following sections describe how to maintain the disks in the cluster.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.6.1    Setting a Disk On Line and Off Line

If you want to set a disk that is used in a service off line, you must ensure that a running service is not using the disk, unless the disk is part of a Logical Storage Manager (LSM) mirrored volume or a mirrored RAID device.

If a disk is being used by a service, you can temporarily stop the service by using the asemgr utility's interactive facility or command-line interface to set the service off line. For example, you can use the following command syntax:

asemgr -x [service]

After you set the service off line, use the scu utility to set the disk off line. Setting a disk off line spins down the disk and allows you to remove it from the storage shelf. For example, to set the /dev/rz28c disk off line, enter the following command:

#scu -f /dev/rrz28c stop

After you perform the maintenance on a disk, you can set the disk on line. For example:

# scu -f /dev/rrz28c start

After you set the disk on line, you can use the asemgr utility's interactive facility or command-line interface to set the service that uses the disk on line. For example, you can use the following command syntax:

asemgr -s [service]


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.6.2    Adding and Removing Disks

When you add a disk to your hardware configuration, you install it in the storage shelf. The disk must have a unique SCSI ID. In addition, you may have to update the system configuration files to ensure that the systems recognize the new disk. See Section 4.3.1.2 for information about recognizing shared disks in the cluster.

When you remove a disk from a storage shelf, you must ensure that a running service is not using the disk, unless the disk is part of a LSM mirrored logical volume or a mirrored RAID device.

If a disk that you want to remove is being used by a service and the disk will be replaced, you can temporarily stop the service by using the asemgr utility to put the service off line. You can then replace the disk and use the asemgr utility to put the service that uses the disk on line. You may have to back up the disk before you remove it and then restore the information to the new disk.

If a disk that you want to remove is being used by a service and the disk will not be replaced, use the asemgr utility to modify the service and remove the disk from the service. You can then remove the disk from the storage shelf.

To physically remove a disk from a storage shelf, partially pull out the disk from its slot (about 3 to 5 centimeters), wait for the disk to spin down, then completely remove the disk from the slot.

Caution

If you remove the disk from the storage shelf without waiting enough time to allow the disk to spin down, the torque induced by the gyroscopic effect may cause you to drop the disk.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.6.3    Backing Up and Restoring Disks

Disks that are local to the member (that is, internal disks not shared disks) are not affected by Available Server and can be backed up and restored with the usual methods. Disks that are on the ASE shared bus need special consideration. You do not have to shut down your system to single-user mode to perform safe backups.

There are three ways to back up a disk used in an ASE:

For UNIX file systems, back up the disk using the dump command and the raw device file /dev/rrznn. Use the restore command to restore a disk. For AdvFS filesets, from the member that is running the service, you can use the clonefset command to clone a fileset, and then use the vdump and vrestore commands to back up and restore the cloned fileset.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.6.4    Handling Disk Failures

The way failed disks are handled in an ASE depends on whether you are using LSM or RAID. If a failure occurs in a disk that is not part of an LSM or RAID mirrored volume, the service stops. After the disk has been replaced and any data restored, you can restart the service.

If a disk that is used in an LSM volume fails, see the DIGITAL UNIX Logical Storage Manager manual for information about replacing failed LSM disks.

If a failure occurs in a disk that is part of an LSM or RAID mirrored volume, you can replace the disk while the service is running. If a disk that is mirrored with RAID fails, see the RAID documentation for information about how to handle this situation.

After a failed or previously unavailable part of an LSM mirrored volume becomes available again, you can reincorporate the device into the service by resynchronizing the mirrored volume outside of the cluster and then rereserving the devices. You rereserve devices by using the asemgr utility and choosing the Advanced Utilities menu item. This method will not interrupt the service.

See the TruCluster Software Products Administration manual for more information on handling disk failures with AdvFS and LSM.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.7    Adding and Removing Shared Buses

If you want to add a shared SCSI bus to your hardware configuration, you must shut down the cluster, as described in Section 7.2, and prepare the systems and storage shelves for the new shared bus connection.

You can remove a shared SCSI bus without shutting down the cluster if you used trilink connectors or Y cables to connect the member systems and storage shelves to the shared SCSI bus. If your configuration meets this requirement, you can disconnect all the devices from the bus.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.8    Disconnecting and Connecting SCSI Signal Converters

If you are using a storage shelf with a single-ended SCSI interface in your hardware configuration, it must be connected to a SCSI signal converter.

If you want to disconnect a SCSI signal converter (and the single-ended storage shelf) from a shared bus, you must turn off the SCSI signal converter before disconnecting the cables. To reconnect it to the shared bus, connect the cables before turning on the SCSI signal converter.

Use the power switch to turn off a standalone SCSI signal converter (DWZZA-AA or DWZZB-AA). To turn off a StorageWorks building block (SBB) SCSI signal converter (DWZZA-VA or DWZZB-VW), pull it from its disk slot.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.9    Maintaining MEMORY CHANNEL Interconnects in a Production Server Environment

The following sections contains information about maintaining MEMORY CHANNEL interconnects. See the MEMORY CHANNEL User's Guide for detailed information about maintaining the MEMORY CHANNEL hardware.

If you are adding a new system to your configuration, see Section 7.4.2 for information about connecting the system to the MEMORY CHANNEL interconnects.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.9.1    Determining the Primary MEMORY CHANNEL Interconnect

Some MEMORY CHANNEL interconnect maintenance tasks require you to determine which interconnect is the primary (active) interconnect and which interconnect is the secondary (inactive) interconnect. Examine the startup message for the last boot and note which peripheral component interconnect (PCI) slot contains the adapter for the primary interconnect.

The following is an example of the startup messages:

mchan0: Module revision = 11
mchan0: jumpered as HUB configuration
mchan0 at pci0 slot 7
mchan1: Module revision = 11
mchan1: jumpered as HUB configuration
mchan1 at pci0 slot 8


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.9.2    Adding or Removing MEMORY CHANNEL Interconnects

If you want to change from a single MEMORY CHANNEL interconnect to redundant MEMORY CHANNEL interconnects without shutting down the cluster, follow these steps to add an interconnect:

  1. Use the asemgr utility to relocate the available server environment (ASE) services running on a member system. If you are using virtual hub mode (you are not using a MEMORY CHANNEL hub), do this on the VH0 system.

  2. Shut down the system.

  3. Install another MEMORY CHANNEL adapter in the system, setting the jumpers for the mode of the existing interconnect, in either standard mode (if you are using a hub) or virtual hub mode.

  4. Connect a MEMORY CHANNEL link cable to the newly installed MEMORY CHANNEL adapter.

  5. If you are using a MEMORY CHANNEL hub, connect the link cable to the line card which occupies the same hub slot position as the line card to which the first adapter is connected. See Chapter 5 for information about connecting adapters to line cards in hubs.

  6. Turn on the system.

  7. Repeat these steps for each system. If you are using a MEMORY CHANNEL hub, turn on the hub after the last system is connected. If you are not using a MEMORY CHANNEL hub, connect the adapter in the first system to the adapter in the second system with the link cable.

To change from redundant MEMORY CHANNEL interconnects to a single MEMORY CHANNEL interconnect, you must remove an interconnect. To do this, follow these steps:

  1. Turn off the hub on the interconnect that you want to remove.

  2. Use the asemgr utility to relocate the ASE services running on a member system. If you are using virtual hub mode (you are not using a MEMORY CHANNEL hub), do this on the VH0 system.

  3. Shut down the system.

  4. Deinstall the MEMORY CHANNEL adapter that is connected to the interconnect you want to remove.

  5. Reboot the system.

  6. Perform steps 2 through 5 on all the systems.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.9.3    Adding a MEMORY CHANNEL Hub

If you want to change from virtual hub mode, which does not require a MEMORY CHANNEL hub, to standard mode, you must add a hub to your configuration. To do this, you must shut down the cluster because you must change the jumpers on the MEMORY CHANNEL adapters.

See the MEMORY CHANNEL User's Guide for information about adapter jumpers.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.9.4    Replacing a MEMORY CHANNEL Hub

If you need to replace a MEMORY CHANNEL hub and you have only one MEMORY CHANNEL interconnect, you must shut down the cluster as described in Section 7.2.

If you have redundant MEMORY CHANNEL interconnects, you can turn off the hub, replace the hub, and then reconnect the cables as described in Chapter 5. Then, you must reboot the member systems, one at a time.


[Contents] [Prev. Chapter] [Prev. Section] [Next Section] [Next Chapter] [Index] [Help]

7.9.5    Connecting a MEMORY CHANNEL Adapter to a Line Card

If a system is connected to a line card that fails and you have an extra line card available in each MEMORY CHANNEL hub, you can connect a MEMORY CHANNEL adapter to a line card without shutting down the cluster. However, you must reboot the system after you connect the system to the line card.

If you have redundant interconnects, make sure that you connect the system to line cards that are in the same slot position in the hubs.


[Contents] [Prev. Chapter] [Prev. Section] [Next Chapter] [Index] [Help]

7.9.6    Disconnecting and Connecting a Link Cable

If you have a single MEMORY CHANNEL interconnect and you disconnect a MEMORY CHANNEL link cable, the member system that was connected to the link cable will crash. You must then reconnect the link cable and reboot the system.

If you have redundant interconnects and you disconnect a MEMORY CHANNEL link cable that is part of the secondary interconnect (the inactive interconnect), you can reconnect the cable.

However, if you have redundant interconnects and you disconnect a MEMORY CHANNEL link cable that is part of the primary interconnect (the active interconnect), you must reconnect the cable and then reboot the member system.


[Contents] [Prev. Chapter] [Prev. Section] [Next Chapter] [Index] [Help]