Watchdog timeout messages displayed for multiple blade servers

Watchdog timeout messages are displayed in the advanced management module event log. Use this procedure if you are seeing these messages for multiple blade servers in a BladeCenter S chassis.

Problem

The advanced management module event log displays watchdog timeout messages for multiple blade servers in a BladeCenter S chassis.

Investigation

Perform these steps to resolve the problem:
  1. Find firmware updates for the advanced management module. Look in the firmware change history for information related to watchdog timeout errors and update the firmware if necessary.

    You can find the firmware by going to Sofware and device drivers - IBM BladeCenter and selecting BladeCenter S.

  2. Search the IBM support page to find firmware updates for the advanced management module. Look in the firmware change history for information related to watchdog timeout errors and update the firmware if necessary.
  3. Verify that the service processor (Integrated Systems Management Processor and Baseboard Management Controller) code levels are up to date or are at least not missing a critical fix.
  4. Verify the operation of the blade servers. If they are responsive, the problem may be a false error condition.
    1. Verify that the IBM Automatic Server Restart (ASR) driver is installed on the blade server.
    2. Update the firmware for the service processor on the blade server.
    3. Update the firmware for the advanced management module.
    4. Replace the advanced management module.
  5. If all the blade servers are nonresponsive and are running the same level of operating system as well as similar applications, start several of the blades again and access the operating system logs for each blade server.
    • Determine if the blades are nonresponsive because of a common software driver or module problem.
    • Verify that the disk and communications drivers are up to date.
  6. Although rare, it is possible that there is enough noise on the RS-485 communication channel to the blade servers to tie up the service processors. Check the event log to see if there are many service processor communication errors occurring for all of the blade servers. If so, then see Service processor communication (SP COMM) errors displayed for a blade server for additional troubleshooting procedures.