Diagnostics

Use the available diagnostic tools to help solve any problems that might occur in the blade server.

The first and most crucial component of a solid serviceability strategy is the ability to accurately and effectively detect errors when they occur. While not all errors are a threat to system availability, those that go undetected are dangerous because the system does not have the opportunity to evaluate and act if necessary. POWER7® processor-based systems are specifically designed with error-detection mechanisms that extend from processor cores and memory to power supplies and hard drives.

POWER7 processor-based systems contain specialized hardware detection circuitry for detecting erroneous hardware operations. Error checking hardware ranges from parity error detection coupled with processor instruction retry and bus retry, to ECC correction on caches and system buses.

IBM® hardware error checkers have these distinct attributes:
  • Continuous monitoring of system operations to detect potential calculation errors
  • Attempted isolation of physical faults based on runtime detection of each unique failure
  • Initiation of a wide variety of recovery mechanisms designed to correct a problem

POWER7 processor-based systems include extensive hardware and firmware recovery logic.

Machine check handling

Machine checks are handled by firmware. When a machine check occurs, the firmware analyzes the error to identify the failing device and creates an error log entry.

If the system degrades to the point that the service processor cannot reach standby state, the ability to analyze the error does not exist. If the error occurs during hypervisor activities, the hypervisor initiates a system reboot.

In partitioned mode, an error that occurs during partition activity is reported to the operating system in the partition.


Notices | Terms of use | Privacy | Support