PowerStore: Pre-upgrade health check warnings about high CPU utilization or host load
Summary: Running a pre-upgrade health check (PUHC) may return warnings about CPU utilization or host I/O load too high or higher than recommended.
Symptoms
Running a pre-upgrade health check (PUHC) before a non-disruptive upgrade (NDU) from Settings > Upgrades > Health Check, shows a warning such as:
- CPU utilization level too high. Reduce system load prior to NDU. (cpu_usage_level_exceeded)
- Node X is experiencing a high average CPU utilization level of xx% for its DataPath service, which is considered too high. Tasks run during upgrade can take more time than normal due to high CPU utilization. This can lead to errors for exceeding time limits per task during the NDU. Reduce system load prior to NDU. (cpu_usage_level_exceeded_2)
- Host IO load on Node X is higher than recommended for running an upgrade. By continuing with NDU, there is a risk that IO may take a longer time to complete or possibly timeout. It is highly recommended to reduce the host IO load before proceeding with the NDU, possibly during known periods of reduced host IO load. (mapper_host_load_failed_x)
Cause
Some of these warnings may be false positives due to timing, transient workload increases, or other issues. Careful verification should be carried out in order to avoid problems during the upgrade.
Resolution
PowerStoreOS upgrades should always be done at quieter times, and outside of peak hours. For planning considerations and precautions, see How to Prepare for a PowerStore Non-Disruptive Upgrade before initiating an upgrade.
In order to confirm if the CPU warning is a false positive or a real warning, go to Hardware > Appliances (select the appliance that is being marked for CPU issues) > Performance tab, and check the CPU utilization details.
The general guidance is:
- If the node CPU utilization is around or below 50%, the upgrade may continue.
- If the node CPU utilization is over 50%, the upgrade should be deferred to a period when the system is less busy.
- If the node CPU is consistently around or over 70%, or if host I/O load is higher than recommended, the upgrade should be postponed until the CPU or host load can be reduced. Continuing the upgrade in such conditions may result in upgrade issues, increased latency for hosts, difficulty accessing storage volumes, timeouts, and so on.