Unsolved

This post is more than 5 years old

2 Intern

 • 

308 Posts

18331

January 28th, 2014 01:00

How to quickly troubleshoot a hardware problem of VNXe

How to quickly troubleshoot a hardware problem of VNXe

Introduction

This article will introduce you to quick troubleshooting tips and resolutions for hardware problem with VNXe series storage. It will help users to check on the VNXe device timely.

This will also help you understand the hardware differences of VNXe3100, VNXe3150 and VNXe3300.

Detailed Information

VNXe series storage has three models, VNXe3100, VNXe3150 and VNXe3300. The hardware specifications of three models are somewhat different.

              I.        The difference of the appearance of VNXe3100, VNXe3150 and VNXe3300.

VNXe3100:

Front view:

The front view of the VNXe3100 platform having a 2U,12 (3.5-inch) disk drive DPE:


untitled.bmp

Rear view:

The rear view of a VNXe3100 platform having a 2U DPE with a Cache Protection Module and a single storage processor (SP A), respectively:


untitled_1.bmp

The rear view of a VNXe3100 platform having a 2U DPE with two storage processors(SP B and A), respectively:


untitled_2.bmp

VNXe3150:

Front view:

The front view of the VNXe3150 platform having a 2U, 12 (3.5-inch) disk drive DPE:

untitled_3.bmp

                 

The front view of the VNXe3150 platform having a 2U, 25 (2.5-inch) disk drive DPE:

untitled_4.bmp

            Rear view:

           

            The rear view of a VNXe3150 platform having a 2U DPE with a Cache Protection Module and a single storage processor (SP A), respectively:

         

untitled_5.bmp


            The rear view of a VNXe3150 platform having a DPE with two storage processors (SP B and A), respectively:

           

untitled_6.bmp

VNXe3300:

           

Front view:

The front view of the VNXe3300 platform having a 3U, 15 (3.5-inch) disk drive DPE (DPE7):      

          

untitled_7.bmp


            The front view of the VNXe3300 platform having a 3U, 25 (2.5-inch) disk drive DPE (DPE8):

           

untitled_8.bmp

Rear view:

The rear view of the VNXe3300 platform having a DPE with two storage processors(SP B and A), respectively:

untitled_9.bmp

            II.        Check and resolve hardware problems

Once you have checked the hardware and a component fails, you need to resolve it timely.

To restore your system to full operation, you need to replace the faulted hardware component. For example: if a disk has faulted, replace it immediately.

Before replacing a hardware component, you need to identify the faulted part. Follow the steps below:

1.     Click System > System Health

2.     The SP or DAE containing the faulted part will be marked with a health icon, or the top level component (SP or DAE) will already be expanded. If marked with one of the icons in the following table, expand the list by clicking the 11.png next to the SP or DAE. The faulted part is highlighted in the graphical display.

    

Icon

Label

untitled_10.bmp

Warning

untitled_11.bmp

Major

untitled_12.bmp

Critical

3.     In the System Components list, select the faulted part to view a description of the part's properties.

4.     You need the following information to order a replacement part:

·         VNXe serial number - located in the System Info section

·         Product ID - located in the Component Description section

·         Serial Number (SN) - located in the Component Description section

Note: If the System Health page cannot determine the S/N and P/N of a part, you need to look at the labels on the part in order to get the S/N and P/N. After confirming the replacement parts needed information, then you can order a replacement part.

Note: Before you order a replacement part, you can try power-cycle the entire VNXe system to attempt to resolve low level problems with the Storage Processors (SPs), I/O connections, disk-array enclosures, the system software, and other system components and returns the system to an operational state.

Next, this procedure involves placing the SPs in Service Mode.

All hosts will lose access to the system. Ensure all host operations that require the VNXe system have completed to prevent data loss.

Follow the steps below (in exact order):

1.     Place both SPs in Service Mode. If the SP is already in Service Mode, you do not need to perform this action.

2.     Disconnect the power cables from the disk-processor enclosure (DPE) to power down the SPs.

3.     Disconnect the power cables from the power supplies on each disk-array enclosure (DAE) to power them down.

4.     Reconnect the power cables to the power supplies on each DAE to power them up.

5.     Reconnect the power cables to the DPE to power up the SPs.

6.     Reboot each SP to return them to Normal Mode.

Note: When both Storage Processors (SPs) are in Service Mode, always return SPA to normal operation first, to avoid management software conflicts. Once SPA is operating normally, you can return SPB to normal operation.

If the problem persists, you need to contact EMC support and replace the component.

           III.        Summary

The EMC VNXe series is a unified storage solution, it addresses the challenges mentioned above. Designed for IT generalists with limited storage expertise, the VNXe abstracts the implementation of advanced storage functionality through an application-driven approach to managing shared storage. You will often use the hardware problem troubleshooting guide mentioned above on VNXe daily maintenance.

It is strongly recommended to save this information for reference.

Author: Leo Li

             

iEMC APJ

Please click here for for all contents shared by us.

31 Posts

September 16th, 2014 08:00

Great document. But what if you want to view troubleshoot the VNXe at the CLI?  What commands would you use?

Thanks,

Amir

2 Intern

 • 

308 Posts

September 19th, 2014 05:00

EMC VNXe series storage is affordable unified storage platform with solution-focused software that’s easy to manage, provision, and protect. In addition, VNXe Unisphere is a graphical, application-oriented model with a web-familiar look and feel. Customer can easily manage and use VNXe storage through Unisphere. So, you only need to check the hardware status by Unisphere. If you are interesting with VNXe CLI and service commands, there are VNXe Unisphere CLI User Guide and VNXe Service Commands Technical Notes for your reference. You can find detailed command from these guides. But in general, these two documents are most use for EMC employee and partner engineers.

31 Posts

September 19th, 2014 08:00

From a monitoring perspective, one significant problem with the VNXe is it's lack of SNMP polling support.  It is not practical to access the GUI to know the health / environmental status of the various components of the platform.  That health / environmental status is a fundamental aspect of regular SNMP polling.

The VNXe platforms support SNMP traps - but on a very generic / high level basis as shown here:

MIB Details
TABLE 1:
The vnxe_alert.mib contains the following 8 traps:

EVENT OID DESCRIPTION
vnxeGenericTrapEmergency .1.3.6.1.4.1.1139.18.1.18.2.0 This trap is generated when the system is unusable.
vnxeGenericTrapAlert .1.3.6.1.4.1.1139.18.1.18.2.1 This trap is generated when action needs to be taken immediately.
vnxeGenericTrapCritical .1.3.6.1.4.1.1139.18.1.18.2.2 This trap is generated when the system is in critical condition.
vnxeGenericTrapError .1.3.6.1.4.1.1139.18.1.18.2.3 This trap is generated when there is an error in the system.
vnxeGenericTrapWarning .1.3.6.1.4.1.1139.18.1.18.2.4 This trap is generated when there is a warning condition in the system.
vnxeGenericTrapNotice .1.3.6.1.4.1.1139.18.1.18.2.5 This trap is generated when there is a normal but significant condition in the system.
vnxeGenericTrapInformational .1.3.6.1.4.1.1139.18.1.18.2.6 This trap is generated when there is an informational message.
vnxeGenericTrapDebug .1.3.6.1.4.1.1139.18.1.18.2.7 This trap is generated when there is a debug-level message.

TABLE 2:
The vnxe_alert.mib contains the following 5 Trap Variables:

VARIABLE DESCRIPTION
::= { vnxeTrapVariable 1 } "This is node/IP address of the system that causes the trap."
::= { vnxeTrapVariable 2 } "This is the component that causes the trap."
::= { vnxeTrapVariable 3 } "This is the symptom ID that causes the trap."
::= { vnxeTrapVariable 4 } "This is the symptom description for SymptomID."
::= { vnxeTrapVariable 5 } "This is the timestamp of the trap."

As a result, I'm developing a script to leverage the CLI to gain insight into the health / environmental values typically ascertained via SNMP polling.  It will leverage the following CLI commands:

1. ssh to "Management IP" of EMC VNXe Platform
2. Issue the following CLI command:

service@(none) spa:~> svc_diag -state=spinfo | less

The output, of the above referenced command, will provide the health status and environmental values of the DPE and DAE components and the status of the individual disks as shown below:
COMMAND: svc_diag -state=spinfo | less
SAMPLE OUTPUT:

service@(none) spa:~> svc_diag -state=spinfo | less This SP's system type is: EMCHW SENTRY DUAL This SP's ID is: SPA Displaying all FRU statuses:         dpe:            OK           temp:         18           spa:          OK : (0x2d) O/S running             dimm0:      OK             dimm1:      OK             dimm2:      OK             ps:         OK 229             bbu:        OK             fan:        OK             slic0:      OK POSEIDON             slic1:      REMOVED UNKNOWN             sas0:       CONNECTED             sas1:       DISCONNECTED             sasxp:      OK 0144           spb:          OK : (0x2d) O/S running             dimm0:      UNKNOWN             dimm1:      UNKNOWN             dimm2:      UNKNOWN             ps:         OK 215             bbu:        OK             fan:        OK             slic0:      OK POSEIDON             slic1:      REMOVED UNKNOWN             sas0:       UNKNOWN             sas1:       UNKNOWN             sasxp:      OK 0144         dae_0_1:        OK           temp:         18           lcca:         OK 0144           psa:          OK 40           lccb:         OK 0144           psb:          OK 29 Displaying backend status:     disk   state         vendor    type    capacity      bsize   speed     ----   -----         ------    ----    --------      -----   -----    0_0_00    OK            SEAGATE    SAS     0x218ceece    520     6GB    0_0_01    OK            SEAGATE    SAS     0x218ceece    520     6GB    0_0_02    OK            SEAGATE    SAS     0x218ceece    520     6GB    0_0_03    OK            SEAGATE    SAS     0x218ceece    520     6GB    0_0_04    OK            SEAGATE    SAS     0x218ceece    520     6GB    0_0_05    OK            SEAGATE    SAS     0x218ceece    520     6GB    0_0_06    OK            SEAGATE    SAS     0x218ceece    520     6GB    0_0_07    OK            SEAGATE    SAS     0x218ceece    520     6GB    0_0_08    OK            SEAGATE    SAS     0x72a5d655    520     6GB    0_0_09    OK            SEAGATE    SAS     0x72a5d655    520     6GB    0_0_10    OK            SEAGATE    SAS     0x72a5d655    520     6GB    0_0_11    OK            SEAGATE    SAS     0x72a5d655    520     6GB    0_0_12    OK            SEAGATE    SAS     0x72a5d655    520     6GB    0_0_13    OK            SEAGATE    SAS     0x72a5d655    520     6GB    0_0_14    OK            SEAGATE    SAS     0x72a5d655    520     6GB    0_0_15    REMOVED    0_0_16    REMOVED    0_0_17    REMOVED    0_0_18    REMOVED    0_0_19    REMOVED    0_0_20    REMOVED    0_0_21    REMOVED    0_0_22    REMOVED    0_0_23    REMOVED    0_0_24    REMOVED

NOTE: When issuing the above referenced command, you will be connected to the primary SP (Service Processor).  In the example output case - we are connected to the SP"A".  The dimmX values (where X is the number of dimm modules present) and the sas0 value will show as "UNKNOWN" for SP"B".  In order to view these values for the secondary SP, you must issue the "ssh peer" command and once again, issue the "svc_diag -state=spinfo | less" command - once connected to the secondary SP as follows:

service@(none) spa:~> ssh peer Last login: Thu Sep 18 22:42:03 2014 from peer service@(none) spb:~> svc_diag -state=spinfo | less ======== Now executing spinfo state ======== This SP's system type is: EMCHW SENTRY DUAL This SP's ID is: SPB Displaying all FRU statuses:         dpe:            OK           temp:         18           spa:          OK : (0x2d) O/S running             dimm0:      UNKNOWN             dimm1:      UNKNOWN             dimm2:      UNKNOWN             ps:         OK 226             bbu:        OK             fan:        OK             slic0:      OK POSEIDON             slic1:      REMOVED UNKNOWN             sas0:       UNKNOWN             sas1:       UNKNOWN             sasxp:      OK 0144           spb:          OK : (0x2d) O/S running             dimm0:      OK             dimm1:      OK             dimm2:      OK             ps:         OK 218             bbu:        OK             fan:        OK             slic0:      OK POSEIDON             slic1:      REMOVED UNKNOWN             sas0:       CONNECTED             sas1:       DISCONNECTED             sasxp:      OK 0144         dae_0_1:        OK           temp:         18           lcca:         OK 0144           psa:          OK 40           lccb:         OK 0144           psb:          OK 29

<<<<<<<<<<>>>>>>>>>>>


The following list contains all the DPE and DAE components and environmentals generated from the command:
DPE Status and Environmental Values
1.   DPE Status
2.   DPE Temperature
3.   SPA / SPB Status
4.   SPA / SPB DIMM Status
5.   SPA / SPB PSU Status
6.   SPA / SPB Battery Backup Unit Status
7.   SPA / SPB Fan Status
8.   SPA / SPB slic0 Status - IO Modules with dual 10Gbps northbound uplinks to the CORE
9.   SPA / SPB sas0 Status- 6Gbps port that provides interconnect to DAE

DAE Status and Environmental Values
1.  DAE Status
2.  DAE Temperature
3.  DAE LCC A / LCC B Status - Line Control Cards / 6 Gbps port that provides interconnect to DPE
4.  DAE PSUs Status


Step 2

1. ssh to "Management IP" of EMC VNXe Platform
2. Issue the following CLI command on the primary SP:
svc_storagecheck --sizes | less

service@(none) spb:~> svc_storagecheck --sizes | less <<<<<<<<>>>>>>>>>>>>>>>>>>>>> ======================= Now running ./server_df ALL ... ======================= server_2 : Filesystem          kbytes         used        avail capacity Mounted on vol_2_1405717341 531776624    201771512    330005112   38%    /vol_2_1405717341 vol_1_1405710374 2167267416    144432720   2022834696    7%    /vol_1_1405710374 NFS00_15K_SPA   1354475408   1145093152    209382256   85%    /NFS00_15K_SPA NFS01_7K_SPA    2114715632   1842266968    272448664   87%    /NFS01_7K_SPA root_fs_common       15368         5272        10096   34%    /.etc_common root_fs_2           129056         8504       120552    7%    / server_3 : Filesystem          kbytes         used        avail capacity Mounted on NFS02_7K_SPB    2114715632    351572240   1763143392   17%    /NFS02_7K_SPB NFS00_7K_SPB    2114715632   1064917728   1049797904   50%    /NFS00_7K_SPB root_fs_common       15368         5272        10096   34%    /.etc_common root_fs_3           129056         8248       120808    6%    / ======================= [Fri Sep 19 00:07:48 UTC 2014] End of Run  =======================

NOTE:  If you run the command on the secondary SP - you will receive the following error message:

service@(none) spa:~> svc_storagecheck --sizes | less ======================= [Fri Sep 0:12:45 UTC 2014] End of Run  ======================= --- ERROR: this utility can only be run on the master SP.

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


I would love to  know what other commands engineers are using to "poll" their VNXe platforms for health/environmentals and IO performance metrics.  There seems to be a complete lack of performance data available from the GUI and the CLI on the VNXe platforms.


Regards,


Amir

2 Intern

 • 

308 Posts

September 21st, 2014 19:00

VNXe engineer also use some commands which are listed on VNXe Unisphere CLI User Guide and VNXe Service Commands Technical Notes. There is no other official command. But if you are interested on VNXe hidden performance data, you can refer to document How to check VNXe performance statistics data . You may get some information from this document what you want.

No Events found!

Top