Watchdog timeout messages

Watchdog timeout messages are displayed in the advanced management module event log. They can occur any time a hardware or software problem causes a CPU on the blade server to become nonresponsive.

Each blade server has a service processor (called a Baseboard Management Controller or BMC) that operates independently of the CPUs on the blade server. It runs whenever the blade server is installed in the BladeCenter S system and connected to active power supplies; the blade server does not need to be powered on.

The service processor communicates with the advanced management module to provide vital product data (VPD) and health status about the blade server. In addition, the service processor is used to perform tasks such as powering on, powering off, and restarting the blade server when you remotely manage the blade server through the advanced management module.

The service processor uses timers, called watchdog timers, to measure blade server events:
  • The BIOS or POST watchdog timer triggers a watchdog event if the blade server becomes nonresponsive during the POST process.
  • The OS watchdog timer triggers a watchdog event if the blade server becomes nonresponsive during the starting of the operating system.
    Note: The Automatic Server Restart (ASR) driver must be installed on the blade server. This driver communicates with the blade service processor and keeps the watchdog timer from counting down to 0 as long as the system processor is still running. You can find this driver by going to Sofware and device drivers - IBM® BladeCenter® and selecting the blade server that you have installed. It is typically listed under Advanced Systems Management.

    The OS watchdog timer may be enabled or disabled by default, depending to the type of blade server. You can enable it or disable it using the advanced settings of the BIOS configuration utility for the blade server.