Article Number: 000188498

VPLEX: Storage volumes reports 'Operational Status' and 'Health State' as unknown

Summary: Version Mismatch between the management server and directors causes all storage volumes to report their 'Operational Status' and 'Health State' as 'unknown'.

This article may have been automatically translated. If you have any feedback regarding its quality, please let us know using the form at the bottom of this page.

Article Content

Symptoms

In a version mismatch situation where a VPLEX firmware upgrade fails and rolls back, we may hit a situation where the VPLEX VS6 MMCS-A (management server) is on the higher firmware version (6.2.x) and the directors are on the original lower firmware version (i.e. 6.1.x). This is causing a cosmetic issue with the VPLEX user interface (UI) resulting in all storage volumes in the system reporting the 'operational' and 'health' status as 'unknown'. However, on closer inspection the volumes are alive and the top-level virtual volume is not degraded.

1] Confirm there is a code mismatch. This can be done using the VPlexcli “health-check” command and/or the VPlexcli “version-a” command.

The example below is from the top section of the health-check output:

Product Version: Version mismatch (or NDU) << mismatch indicates there is a different
                                                                                                    firmware version between the
                                                                                                    management server and directors

Product Type: Metro
WAN Connectivity Type: FC
Hardware Type: VPL <-- represents the VS6
Cluster Size: 4 engines <-- says this is Quad Engine configuration, 2 = Dual and 1 = Single
Cluster TLA:  
   cluster-1: CKMXXXXXXXXXXX  
   cluster-2: CKMXXXXXXXXXXX


The storage-volume issue is most apparent in the VPlexcli command “storage-volume summary” output and is also visible from the Back-End (BE) Storage portion of the 'health-check' command output. The issue is not reported in the 'ndu pre-check', the 'connectivity validate-be', nor the 'cluster-status' command outputs.

Example from VPlexcli outputs provided below.


Before the failed NDU attempt no storage volumes will report as 'unknown', yet after the failed NDU we will see the following, the storage-volume 'IO status' will equal 'alive', however, the 'Operational Status' and the 'Health State'  will equal 'unknown' for all storage volumes.

VPlexcli:/> storage-volume summary
SUMMARY (cluster-1)
StorageVolume Name                       IO Status  Operational Status  Health State
---------------------------------------  ---------  ------------------  ------------
VCKM001530XXX1-00003                     alive      unknown             unknown  << observe
VCKM001530XXX2-00004                     alive      unknown             unknown  <<
VCKM001530XXX3-00006                     alive      unknown             unknown  <<
.
.

Storage-Volume Summary  (no tier)
----------------------  ---------------------

Health                  out-of-date         0
                        storage-volumes  4372  << note the total number of storage volumes in the system
                                           unhealthy        4372  << note the total number of volumes equal unhealthy, all
                                                                                         storage volumes in the system are now reporting
                                                                                         unhealthy.


Vendor                  DGC              1276
                        XtremIO          3096

Use                     claimed             1
                        meta-data           4
                        unclaimed           5
                        used             4362

Capacity                total           2.92P


From the VPLEX 'health-check' command output scroll to the section called “BE Storage” and check the “Unhealthy Storage Volumes” to see all volumes are reported Unhealthy.

BE Storage: <<
-----------
Cluster    Total     Unhealthy  Total Storage  No     Not visible  With         Total    
Name       Storage   Storage    Provisioned/   Dual   from         Unsupported  Extents/ 
           Volumes/  Volumes    Limit          Paths  All Dirs     # of Paths   Limit    
           Limit                                                                          
---------  --------  ---------  -------------  -----  -----------  -----------  -------- 
cluster-1  4372/12000 4372      2.92P/8PB      0      0            0            4362/24000
cluster-2  4372/12000 4372      2.92P/8PB      0      0            0            4362/24000

Looking to the 'FE Storage' located directly below the 'BE storage' section of the 'health-check' command output, discussed above, you will see 0 (or less) unhealthy virtual volumes. This is key in confirming this is a cosmetic issue with the VPLEX User Interface (UI).

[Note]: If there was a true issue with the storage volumes we would expect the top-level virtual volume to report degraded in some way. We would also expect to see no (or less) degraded virtual volumes reporting in the VPlexcli command “virtual-volume summary”. However, If degraded / unhealthy virtual volumes are reporting please stop as you may have a separate issue requiring further investigation outside the scope of this KB.

FE Storage:
-----------
Cluster    Total       Unhealthy  Total Dist  Unhealthy  Local          With unsupported  
Name       Virtual     Virtual    Devs/       Dist       Top-Level      RAID1 mirror      
           Volumes/    Volumes    Limit       Devs       Devices/Limit  legs                 
           Limit                                                                          
---------  ----------  ---------  ----------  ---------  -------------  ----------------  
cluster-1  3794/12000  0          1024/12000  0          2770/12000     0                 
cluster-2  4356/12000  0          1024/12000  0          3333/12000     0

Cause

When upgrading the VPLEX firmware the high level procedure is:

Upgrade the VS6 management server (MMCS-A) to the new higher target version
Upgrade the VPLEX directors to the new higher target version

These are done in two separate steps creating a time where the management server is running the new code (6.2.x in this case) and the directors are still at the original lower coder version (i.e. 6.1.x) in which case the above issue may occur.

In situations where we have an issue or block in completing the code upgrade on the directors, we can have a situation where the MMCS and director code mismatch can be present for a prolonged time while we investigate the upgrade issue or block. During this time all storage volumes will report unknown.

This issue is caused by the version mismatch between the Management Server and the directors. The underlying firmware command is failing, which is leading to the 'unknown' state for 'Health State'.
We can see this by the repeated streaming of the below event in the client logs:

[https-jsse-nio-49881-exec-9] CommandResult: Command 'mdi get disk --tabular diskId ioStatus where visibility == "external" and ioStatus != "ok"' returned error code 1000000015 (invalid subcommand).

The command is failing because it issues a sub-command present in 6.2 that is not present in 6.1.x, and therefore is returned as invalid by the director firmware producing the 'unknown' state described above.

Resolution

Generally, it is okay to operate with the firmware mismatch for a period of time however, it is not recommended to remain in this state for too long and to resolve this issue as soon as possible.

Resolution:
This issue is resolved once the director firmware upgrade is completed, bringing the directors and management server onto the same matching code version. This is the preferred option to resolve this issue and this option should be applied.

As stated earlier in this article, this is a cosmetic issue and it is okay to proceed with the director firmware upgrade while this issue is present.

This issue, although cosmetic, can cause other related issues such as:

VPLEX upgrade delays, if the issue is noticed by the upgrade engineer (RCM) between the management server and director upgrade they (RCM) may assume there is a true issue related to storage volumes and stop the upgrade to investigate. However, if the upgrade engineer running the upgrade is unsure or has questions about this situation, please contact VPLEX support.
For the Product AppSync, because it queries the VPLEX storage-volume status if it receives the “unknown” response can have operational issues.

Workaround:
In cases where it is not possible to operate with the firmware mismatch please contact Dell EMC VPLEX support requesting further assistance and mention this article.

VPLEX: Storage volumes reports 'Operational Status' and 'Health State' as unknown

Summary: Version Mismatch between the management server and directors causes all storage volumes to report their 'Operational Status' and 'Health State' as 'unknown'.

Article Content

Symptoms

Cause

Resolution

Article Properties

Affected Product

Last Published Date

Version

Article Type

Welcome

Welcome to Dell

VPLEX: Storage volumes reports 'Operational Status' and 'Health State' as unknown

Summary: Version Mismatch between the management server and directors causes all storage volumes to report their 'Operational Status' and 'Health State' as 'unknown'.

Article Content

Symptoms

Cause

Resolution

Article Properties

Affected Product

Last Published Date

Version

Article Type