Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

PowerStore: Landing Page for In Market (off-release) System Health Checks

Summary: Occasionally health checks should be added after the PowerStoreOS is released. These health checks are supplied by the thin package mechanism and identify various known issues in the PowerStore cluster. ...

This article may have been automatically translated. If you have any feedback regarding its quality, please let us know using the form at the bottom of this page.

Article Content


Instructions

Background

Occasionally new issues are found following the release of a PowerStoreOS which are not detected by the OS's integrated health checks and alerts features. The Heath Check thin package feature is used to deliver new health checks to an installed PowerStoreOS.  

The Health check package contains health checks that are performed prior to a Non-Disruptive Upgrade (NDU). The package also includes general system health checks that are invoked, on demand, from the PowerStore Manager (Monitoring -> System Checks -> Run System Check).

The health check package must be uploaded to the PowerStore cluster and then installed.
 
  IMPORTANT

 For detailed instructions on installing and using the health check package see one of these KB articles: .


Table of Contents

 

Health Check package for 3.0.x and 3.2.x

The table below lists the health checks that are in the posted 3.x Health Check thin package. The package is compatible with 3.0.x, 3.2.x 3.5.x and 3.6 (including 3.6.1). It is not compatible with 2.x.
The Health Check Package contains both validations used by both System Checks and Pre-Upgrade Health Check (PUHC). 
Note: When installing (Upgrade) the PowerStore-health_check-3.6.0.0-225306 heath check package on  a cluster running PowerStoreOS 3.6.1 there is a notification raised "ndu_upgrade_path_missing_hc_data". This message should be ignored and the upgrade can be completed. The health checks in the PowerStore-health_check-3.6.0.0-225306 package are valid for PowerStoreOS 3.6.1.

Distribution: Posted in Drivers & Downloads: PowerStore-health_check-3.6.0.0-225306-retail.tgz.bin (Requires login to the Dell support site.) The package must be downloaded from this site unless automatic download is enabled. The package is uploaded automatically to the cluster if the automatic download option is enabled (PowerStore Manager:  Settings > Upgrades > Automatic download is enabled). The automatic download feature is disabled by default.

How to Install: After the package is uploaded, it must be installed (PowerStore Manager: Upgrades > Upgrade).

How to run: 
  • Health checks are run from the PowerStore Manager UI (Monitoring -> System Checks -> Run System Check.) Alternatively, the installed health check can be run using the service script svc_health_check
  • PUHC checks are run from the PowerStore Manager UI Upgrades page. It is run when the "Health Check" button is pressed and when the "Upgrade" button is pressed.    
 

Pre-Upgrade Health Checks (PUHC)

Test name Description KB Article 
 off_release_check_iscsi_rep_block_size_failed Detects if there are volumes of 4096 bytes size being replicated over iSCSI protocol. 000221547 PowerStore: Pre-Upgrade Health Check (PUHC) detects volumes of 4096 bytes size being replicated over iSCSI protocol
 efi_boot_check
 off_release_efi_boot_check 
Checks that the correct boot entry option is used. 000222187 PowerStore: Pre-Upgrade Heath Check detects that reboot was not using an incorrect boot entry option
 off_release_rba_configuration_check Determines if RBA tier is configured. 000218438 PowerStore: Pre-Upgrade Health Check to detect if RBA tier is enabled
 iom_activation_check Prevent NDU for IOM/SLIC w/o activation. 000216558 Powerstore: Health check has detected that a NVMe Expansion Enclosure (ENS24) was added but not recognized
silent_drive_failure_check Detects if an underlying firmware upgrade process is running 000216659 PowerStore: Heath check detects the missing SSD issue
off_release_check_proc_install_disk_firmware  Detects if an underlying firmware upgrade process is running 000218391 PowerStore: Pre-Upgrade Heath Check detects an underlying firmware upgrade process is running
off_release_ssd_in_rg_check Detects if a SSDs is not in a DRE group 000218650 PowerStore: Pre-Upgrade Heath Check detects that not all the SSDs are in a DRE group
SAS drives with firmware port locked Detects locked firmware port 000207951 PowerStore: Pre-Upgrade Health Check for locked firmware port in Samsung SAS drives
PS Redundancy Detects non-redundant power supplies. 000214821 PowerStore: Pre-Upgrade Health Check (PUHC) detects non-redundant power supply
replication_session_state Detects replication session is in progress. 000214505 PowerStore: Pre-upgrade Health Check (PUHC) detects replication is in a state that prevents NDU.
scheduled_vm_snapshot Detects if there are any VM scheduled snapshots in progress. 000214504 PowerStore: Pre-Upgrade Health Check (PUHC) checks if all snapshot commands are all in completed state.
off_release_check_chap_authentication  off_release_check_chap_authentication 000214503 PowerStore: Pre-Upgrade Health Check (PUHC) detects if the CHAP transit connection is properly configured.
The maintenance window is configured.  Detects if a maintenance window has been configured. Only relevant for OS versions where the system does not automatically enable the maintenance window before NDU. 000212508 PowerStore: Pre-Upgrade Health Check (PUHC) detects that the maintenance window has not been configured.
Detect a secondary IP issue. Detects secondary IP issue on NVMe expansion enclosure 000215560 PowerStore: Pre-upgrade Health Check (PUHC) detects an issue with the secondary IP setting on an ENS24 NVMe expansion enclosure.
SDNAS snapshot limit Detects if SDNAS snapshots exceeded their limit. 000206131 PowerStore: System Health Check detects that the SDNAS snapshot limit has been exceeded. 
Duplicate FW entry check Detects duplicate component firmware entries in a node's resume (registry). 000203390 PowerStore: System Health Checks detects duplicate firmware entries. 
Initiator connectivity check Detects if any nonredundant initiators exist. 000196194 PowerStore: System Health Checks Detected Nonredundant Initiators
Reboot flag set Detects if the reboot flag is set. 000205908 PowerStore: System Health Checks detects the reboot flag is set.
Recovery partition image check  PUHC detects incorrect filename in recovery partition. 000200075 PowerStore: System Health Checks for incorrect filename in recovery partition 
  

System Checks

Test name Description KB Article  
indus_encryption_offset_check Detects drive invalid encryption band location for NVMe expansion  enclosures 000220624 PowerStore: System Check detects NVMe expansion enclosure (ENS24) drive invalid encryption band location
dp_dedupe_destage_leak_check  Detect unwanted destages to cause excessive drive wear 000220203 PowerStore: System Check detects unnecessary de-stages issue
kr_link_boot_option_check Detect if the KR link PXE boot option is not enabled on both nodes of 500T appliances.  000220804 PowerStore: System Check detects if the KR link PXE boot option is not enabled on both nodes for PowerStore 500T appliances.
iom_activation_check Prevent NDU for IOM/SLIC w/o activation. 000216558 Powerstore: Health check has detected that a NVMe Expansion Enclosure (ENS24) was added but not recognized
dp_resiliency_mode_check Detect ungraceful exit of resiliency mode (vdisk issue) 000217840  PowerStore: System Checks detects that the appliance unnecessarily remains in resiliency mode
sdnas_capacity_alert_check Detect if FS capacity alert disabled after upgrade 000217839 PowerStore: Filesystem usage capacity alerts disabled after upgrade.
unfinished_ndu_check Detects the existence of a unfinished upgrade. 000213265 PowerStore: System health Check has detected remnants of failed NDU commits
silent_drive_failure_check Detects issue of KB 000216381 PowerStore: SSD Failed Without an Alert Being Displayed 000216659 PowerStore: Heath check detects the missing SSD issue
target_port_group_id_check Detects a target port group issue  affecting mapping of an NVMeoF volume 000216953 PowerStore: System health check detects a target port group issue that may affect mapping of an NVMeoF volume
indus_drive_paths_check Detects unstable paths to ENS24 NVMe expansion enclosure 000212444 PowerStore: System Health Check detects unstable paths to the ENS24 NVMe expansion enclosure.
dimm_sn_check Detects inconsistencies in DIMM serial numbers 00207658 PowerStore: System Health Checks detects inconsistent DIMM Serial Numbers.
recovery_partition_image_check   Detects incorrect filename in recovery partition 000200075 PowerStore: System Health Checks for incorrect filename in recovery partition 
duplicate_fw_entry_check Detects duplicate component firmware entries in a node's resume (registry).  000203390 PowerStore: System Health Checks detects duplicate firmware entries.
cpu_ierr_check Checks for CPU internal error 000196192 PowerStore: System Health Checks detects an issue in the CPU IERR Check.
InitiatorConnectivityCheck  Detects nonredundant initiators 000196194 PowerStore: System Health Checks Detected Nonredundant Initiators
icd_network_check Detects missing connectivity to ToR 000196193 PowerStore: System Health Checks detected an ICD Network Connectivity issue
dimm_correctable_error_check Detects DIMM CE count (5k threshold) 000199245 PowerStore: System Health Checks detects excessive DIMM Correctable Errors (CE) count.
active_system_alert_check Detects active Major and Critical alerts 000192609 PowerStore: Active alerts were detected by health checks.
cyc_node_space_check  Detects node's /cyc_node directory has insufficient space. 000198173 PowerStore: System Health Checks detects lack of space in /cyc_node.
time_skew_check Detects unsupported large time skew 000196199 PowerStore: System Health Checks detects high time skew on Nodes and BMC.
db_tmpfiles_check  Detects database temporary files larger than expected 000196198 PowerStore: System Health Checks detects large database temporary files.
bbu_sensor_check Detects failure in various BBU health checks 000196197PowerStore: System Health Checks detects invalid battery status.
component_sn_check Detects inconsistent serial number for BBU or PSU 000196196 PowerStore: System Health Checks detects that component Serial Numbers are not consistent: fru_items_sn_check
fsck_leftover_check Detects if there exists the fsck generated file cyc-sys-mode-override.txt. 000201738 PowerStore: System Health Checks detects recovery file.
component_stale_fw_check Detects if the firmware is up to date and if it is compatible with the Dell X.509 signature. 000201500 PowerStore: System Health Checks detects if firmware upgrade is required.

 

 

Health Check package for 2.1.x

The table below lists the health checks that are in the posted Health Check thin package for 2.1.x. The package is compatible with 2.1.x. It is not compatible with 3.x.
This package contains System Checks for general health monitoring and for pre-upgrade health checks. It is recommended for the system heath to be checked periodically and prior to performing maintenance operations. Before performing an NDU, the pre-upgrade health checks are required to be performed.

Distribution: Posted in Drivers & Downloads: PowerStore-health_check-2.1.1.2-2069723. (Requires login to the Dell support site.) The package must be downloaded from this site.

How to Install: After the package is uploaded, it must be installed (PowerStore Manager: Upgrades > Upgrade).

How to run: 
  • System health checks are run from the PowerStore Manager UI (Monitoring -> System Checks -> Run System Check.)
  • Pre-upgrade health checks are run from the PowerStore Manager UI (Monitoring -> System Checks -> Upgrade Extension.)
  • Alternatively, the installed health check can be run using the service script svc_health_check
 
Test name Description KB for failure 
mtc_drive_counter_check Detects an MTC NVRAM drive issue 000212587 PowerStore: System Health Check detects an MTC NVRAM drive issue.
drive_flags_check Detects offline and failed drives including those that do not raise an alert 000207485 PowerStore: System Health Check detects an offline or failed SSD.
bbu_sensor_check Detects failure in various BBU health checks 000196197 PowerStore: System Health Checks detects invalid battery status.
kms_lockbox_file_check Detects an issue with the dare lockbox 000196653 PowerStore: Health check detects an issue with the lockbox.
os_package_name_check Detects incorrect filename in recovery partition 000200075 PowerStore: Health check detects an issue with the filename, recovery partition, or PowerStoreOS package version.
duplicate_fw_entry_check Detects duplicate component firmware entries in a node's resume (registry).     000203390 PowerStore: System Health Checks detects duplicate firmware entries.
fsck_leftover_check Detected unexpected recovery files 000201738 PowerStore: System Health Checks detects recovery file.
recovery_partition_image_check   Detects incorrect filename in recovery partition 000200075 PowerStore: System Health Checks for incorrect filename in recovery partition
symmetric_icm_connection Detects missing ICM connection 000203115 PowerStore: Heath Check package check for asymmetric ICM connections fails.
cpu_ierr_check Checks for CPU internal error 000196192 PowerStore: System Health Checks detects an issue in the CPU IERR Check.
InitiatorConnectivityCheck  Detects nonredundant initiators 000196194 PowerStore: System Health Checks Detected Nonredundant Initiators
icd_network_check Detects missing connectivity to ToR 000196193 PowerStore: System Health Checks detected an ICD Network Connectivity issue
symmd_fw_upgrade_flag_check  Detects a PSU in an invalid state 000199922 PowerStore: System Health Checks detects an incorrect PSU State.
dimm_correctable_error_check Detects DIMM CE count (5k threshold) 000199245 PowerStore: System Health Checks detects excessive DIMM Correctable Errors (CE) count.
active_system_alert_check Detects active Major and Critical alerts 000192609 PowerStore: Active alerts were detected by health checks.
cyc_node_space_check  Detects node's /cyc_node directory has insufficient space. 000198173 PowerStore: System Health Checks detects lack of space in /cyc_node.
time_skew_check Detects unsupported large time skew  000196199 PowerStore: System Health Checks detects high time skew on Nodes and BMC.
db_tmpfiles_check  Detects database temporary files larger than expected 000196198 PowerStore: System Health Checks detects large database temporary files.
bbu_ipmi_i2c_check
Detects failure in various BBU health checks 000196197 PowerStore: System Health Checks detects invalid battery status.
 ru_items_sn_check Detects inconsistent serial number for BBU or PSU 000196196 PowerStore: System Health Checks detects that component Serial Numbers are not consistent: fru_items_sn_check
 

 

Article Properties


Affected Product

PowerStore

Last Published Date

27 Mar 2024

Version

9

Article Type

How To