Watchdog timeout messages are displayed in the advanced management module event
log. Use this procedure if you are seeing these messages for multiple blade
servers in a BladeCenter S chassis.
Problem
The advanced management module event
log displays watchdog timeout messages for multiple blade servers in a BladeCenter S chassis.
Investigation
Perform these steps to resolve the
problem:
- Find firmware updates for the advanced management module.
Look in the firmware change history for information related to watchdog timeout
errors and update the firmware if necessary.
You can find the firmware by
going to Sofware and device drivers - IBM BladeCenter and selecting BladeCenter
S.
- Search the IBM support page to find firmware updates for the advanced management module. Look in the
firmware change history for information related to watchdog timeout errors
and update the firmware if necessary.
- Verify that the service processor (Integrated Systems
Management Processor and Baseboard Management Controller) code levels are
up to date or are at least not missing a critical fix.
- Verify the operation of the blade servers. If they are responsive, the
problem may be a false error condition.
- Verify that the IBM Automatic Server Restart (ASR) driver is installed
on the blade server.
- Update the firmware for the service processor on the blade server.
- Update the firmware for the advanced management module.
- Replace the advanced management module.
- If all the blade servers are nonresponsive and are running the same level
of operating system as well as similar applications, start several of the
blades again and access the operating system logs for each blade
server.
- Determine if the blades are nonresponsive because of a common software
driver or module problem.
- Verify that the disk and communications drivers are up to date.
- Although rare, it is possible that there is enough noise on the RS-485
communication channel to the blade servers to tie up the service processors.
Check the event log to see if there are many service processor communication
errors occurring for all of the blade servers. If so, then see Service processor communication (SP COMM) errors displayed for a blade server for
additional troubleshooting procedures.