
Dell EMC PowerEdge XE7100 Installation and Service Manual
Fault tolerant redundancy
Policy Budgeting
Fault Tolerant Redundancy is a hybrid redundancy mode which uses the power capacity limits of a single Power Supply Unit for Power Budget Checks, similar to Grid Redundancy, but enforces added performance limiting after redundancy is lost. Previous generation modular sleds will still work with Fault Tolerant Redundancy enabled, but they treat it identically to Grid Redundancy.
When the maximum potential power needs of installed chassis components exceeds the capacity of a single Power Supply, the Chassis Management Controller (CMC) will deny power on to further chassis components. The Power Budget Checks for Fault Tolerant Redundancy ensure that the Shared Infrastructure Chassis will remain operational in the event of maximum potential workload conditions at the time of an AC Grid or PSU Supply failure. Using the maximum potential is a conservative target that ensures continued operation across the wide range of potential customer workloads for a given configuration.
Policy philosophy
Similar to Grid Redundancy, Fault Tolerant Redundancy is a conservative redundancy policy that ensures that the Shared Infrastructure Chassis and all installed components remain operational with no risk of shutdown in the event of an AC Grid or Power Supply failure even when all installed components are simultaneously running at their worst case power consumption. New for Fault Tolerant Redundancy is a limit on peak performance that occurs when redundancy is lost. Fault Tolerant Redundancy can maintain the same conservative standards of redundancy as traditional Grid Redundancy by limiting peak power after redundancy is lost to levels which fit within the surviving Power Supply.
Policy control
As with all Redundancy Policies, while the two Power Supplies remain healthy, load is shared evenly between them and the capacity of both Power Supplies is made available for use. In the event of an AC Grid or Power Supply failure, Power Controls will rapidly engage to restrict the power consumption of the chassis and ensure that consumption is restricted to what a single Power Supply can support. Besides the controls used with all Redundancy Policies, Fault Tolerant Redundancy also implements more performance limiting functionality which restricts the peak power after redundancy loss.
For a fully loaded chassis running at maximum potential power this can result in some observed performance reduction as the chassis Power Control limits are enforced. In practice, customer workloads are often not at the maximum potential power and so practical performance reduction during an AC Grid or Power Supply failure is often minor or even unnoticeable.
Power on behavior after fault
In the event of an AC Grid or Power Supply failure, new chassis components are enabled to power on as long as the maximum potential power of the newly installed chassis components does not exceed the capacity of a single Power Supply when evaluated by the chassis Power Budget Checks. This means that, while customers will note a chassis "Critical" state due to the loss of redundancy, they will observe no difference in which chassis components are enabled to power on (both before and after a redundancy fault). This is because in both cases, the chassis Power Budget Checks use the capacity of only a single Power Supply. This is a key difference from the other Shared Infrastructure Chassis Redundancy Policies.
Logging behavior
As with all Redundancy Policies, when a Power Supply Unit fails, a log message is generated. For the Fault Tolerant Redundancy policy, a log message will also be recorded to note a "Loss of Redundancy". This message indicates that the system is continuing to operate in a Non-Redundant state, and action is necessary to either restore power to a failed AC Grid or replace a failed Power Supply Unit. Details in log messages make it possible to distinguish between these two cases. Finally, in case power-on of a chassis component is denied due to a Power Budget Check, the denial is logged both in CMC logs and iDRAC logs (in the case of compute sleds).