Start a Conversation

Unsolved

This post is more than 5 years old

4133

July 20th, 2015 08:00

Drive Firmware upgrade did not upgrade my SSD's

I have a new, pre-production cluster of X410's. Support had me update OneFS to 7.1.1.4, then Node Firmware package 9.2.3, then Drive Firmware support package 1.6. All of the updates completed successfully, but it did not update the firmware on my SSD drives in Bay 1 of each node. The cluster wants the drives replaced, and keeps sending alerts to Isilon support. For each node I have the alert "The drive in Bay 1 has firmware version A100 which does not match the configuration installed on the cluster. Please install the appropriate Drive Support Package and update firmware"

Support had me power down the cluster, the bring it back up and try to format those drives. That didn't work. They had me reinstall the Drive firmware package, that didn't work either. (I have verified that it is not a "No SSD" firmware package)

In the About this Cluster section, there are alerts for Cluster Firmware CMCSDR_Honeybadger, CMC_HFHB, and IsilonFPV1.

Anybody have any ideas of how I can get these drives back online, and get their firmware updated?

Thank you

205 Posts

July 20th, 2015 08:00

So it actually smartfailed all the SSDs? The alert is generally something ignorable, and that typically won't trigger a flexprotect on those drives.

Just to check, you're not using L3, right?

205 Posts

July 20th, 2015 08:00

Given that it's a wear threshold exceeded, I'd say they owe you some new SSDs.

L3 is a mode where the SSDs are used as an evacuation cache for the RAM rather than exclusively for metadata acceleration.

Can you post an isi devices -n and an isi status -v for your cluster?

21 Posts

July 20th, 2015 08:00

So, I'm wondering why the Drive firmware package didn't update them, and will it update new ones? Do I try Drive firmware update 1.7? They all showed up as failed, simultaneously, right after the firmware updates were installed. So, I don't think they are actually failed drives. Is there any way to force a firmware update of those drives? They are the only SSD's in the cluster.

21 Posts

July 20th, 2015 08:00

I'm not sure what L3 is. Isilon is brand new to us.

These are the alerts it's sending to support, for all 6 nodes.

Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0). Drive is being removed from the system - please schedule drive replacement immediately.

21 Posts

July 20th, 2015 08:00

All 6 nodes look exactly like this.

Node 1, [ATTN]

  Bay 1        Lnum 43      [REPLACE]      SN:0RV79ZYA            /dev/da19

  Bay 2        Lnum 16      [HEALTHY]      SN:PN2134P6KRX86X      /dev/da20

  Bay 3        Lnum 34      [HEALTHY]      SN:PN2134P6KSLLBX      /dev/da1

  Bay 4        Lnum 33      [HEALTHY]      SN:PN1134P6KTGXHW      /dev/da2

  Bay 5        Lnum 38      [HEALTHY]      SN:PN2134P6KSV80X      /dev/da21

  Bay 6        Lnum 32      [HEALTHY]      SN:PN1134P6KSWPLW      /dev/da3

  Bay 7        Lnum 31      [HEALTHY]      SN:PN2134P6KSMR2X      /dev/da4

  Bay 8        Lnum 30      [HEALTHY]      SN:PN1134P6KTGYWW      /dev/da5

  Bay 9        Lnum 14      [HEALTHY]      SN:PN2134P6KSSG7X      /dev/da22

  Bay 10       Lnum 29      [HEALTHY]      SN:PN1134P6KTLA7W      /dev/da6

  Bay 11       Lnum 28      [HEALTHY]      SN:PN2134P6KSLDZX      /dev/da7

  Bay 12       Lnum 27      [HEALTHY]      SN:PN1134P6KTL3WW      /dev/da8

  Bay 13       Lnum 13      [HEALTHY]      SN:PN2134P6KT75TX      /dev/da23

  Bay 14       Lnum 12      [HEALTHY]      SN:PN2134P6KRX7KX      /dev/da24

  Bay 15       Lnum 11      [HEALTHY]      SN:PN2134P6KSV7SX      /dev/da25

  Bay 16       Lnum 26      [HEALTHY]      SN:PN1134P6KTGYLW      /dev/da9

  Bay 17       Lnum 10      [HEALTHY]      SN:PN2134P6KSV4XX      /dev/da26

  Bay 18       Lnum 9       [HEALTHY]      SN:PN2134P6KSUM9X      /dev/da27

  Bay 19       Lnum 8       [HEALTHY]      SN:PN2134P6KSUPWX      /dev/da28

  Bay 20       Lnum 25      [HEALTHY]      SN:PN1134P6KTL6SW      /dev/da10

  Bay 21       Lnum 7       [HEALTHY]      SN:PN2134P6KSSE9X      /dev/da29

  Bay 22       Lnum 6       [HEALTHY]      SN:PN2134P6KRYLGX      /dev/da30

  Bay 23       Lnum 24      [HEALTHY]      SN:PN2134P6KSURSX      /dev/da11

  Bay 24       Lnum 36      [HEALTHY]      SN:PN2134P6KSMSHX      /dev/da12

  Bay 25       Lnum 22      [HEALTHY]      SN:PN1134P6KSWGMW      /dev/da13

  Bay 26       Lnum 21      [HEALTHY]      SN:PN2134P6KRX7AX      /dev/da14

  Bay 27       Lnum 20      [HEALTHY]      SN:PN1134P6KSWE2W      /dev/da15

  Bay 28       Lnum 5       [HEALTHY]      SN:PN2134P6KSUKTX      /dev/da31

  Bay 29       Lnum 35      [HEALTHY]      SN:PN1134P6KSWNEW      /dev/da16

  Bay 30       Lnum 4       [HEALTHY]      SN:PN1134P6KSWPVW      /dev/da32

  Bay 31       Lnum 19      [HEALTHY]      SN:PN1134P6KSWNYW      /dev/da17

  Bay 32       Lnum 3       [HEALTHY]      SN:PN2134P6KRX6SX      /dev/da33

  Bay 33       Lnum 18      [HEALTHY]      SN:PN1134P6KSWR7W      /dev/da18

  Bay 34       Lnum 2       [HEALTHY]      SN:PN1134P6KSWD9W      /dev/da34

  Bay 35       Lnum 1       [HEALTHY]      SN:PN2134P6KSMSRX      /dev/da35

  Bay 36       Lnum 0       [HEALTHY]      SN:PN2134P6KSMSVX      /dev/da36

Cluster Name: Isilon-XXXXXXX-Cluster01

Cluster Health:     [ ATTN]

Cluster Storage:  HDD                 SSD Storage

Size:             375T (379T Raw)     0 (0 Raw)

VHS Size:         4.0T

Used:             28G (< 1%)          0 (n/a)

Avail:            375T (> 99%)        0 (n/a)

                   Health  Throughput (bps)  HDD Storage      SSD Storage

ID |IP Address     |DASR |  In   Out  Total| Used / Size     |Used / Size

-------------------+-----+-----+-----+-----+-----------------+-----------------

  1|10.105.32.131  |-A-- |    0|  512|  512| 3.7G/  62T(< 1%)|(No Storage SSDs)

  2|10.105.32.129  |-A-- |  22K| 336K| 358K| 5.9G/  62T(< 1%)|(No Storage SSDs)

  3|10.105.32.122  |-A-- |    0|   30|   30| 2.9G/  62T(< 1%)|(No Storage SSDs)

  4|10.105.32.123  |-A-- |    0| 134K| 134K| 4.6G/  62T(< 1%)|(No Storage SSDs)

  5|10.105.32.124  |-A-- |    0| 224K| 224K| 4.6G/  62T(< 1%)|(No Storage SSDs)

  6|10.105.32.128  |-A-- |    0| 224K| 224K| 6.1G/  62T(< 1%)|(No Storage SSDs)

-------------------+-----+-----+-----+-----+-----------------+-----------------

Cluster Totals:          |  22K| 919K| 941K|  28G/ 375T(< 1%)|(No Storage SSDs)

     Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

Critical Events:

07/17 14:48   3 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...

07/17 14:48   6 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...

07/17 14:48   2 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...

07/17 14:48   4 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...

07/17 14:49   5 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...

07/17 14:49   1 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...

07/17 14:49   4 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...

07/17 14:59   5 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...

07/20 09:08   1 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...

07/20 09:08   2 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...

07/20 09:08   6 Recurring: One or more drives (bay(s) 1 / type(s) SSD) are ...

07/20 09:08   3 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...

07/20 09:13   2 Recurring: Bay 1 System Area Wear threshold exceeded: 1 (Th...

07/20 09:14   3 Recurring: Bay 1 System Area Wear threshold exceeded: 1 (Th...

07/20 09:16   4 Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0)...

07/20 09:18   6 Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0)...

07/20 09:22   5 Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0)...

07/20 09:33   1 Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0)...

Cluster Job Status:

No running jobs.

No paused or waiting jobs.

No failed jobs.

Recent job results:

Time            Job                        Event

--------------- -------------------------- ------------------------------

07/19 22:00:26  FSAnalyze[418]             Succeeded (LOW)

07/19 00:00:20  ShadowStoreDelete[417]     Succeeded (LOW)

07/18 22:00:21  FSAnalyze[416]             Succeeded (LOW)

07/17 22:00:48  FSAnalyze[415]             Succeeded (LOW)

07/16 22:00:40  FSAnalyze[414]             Succeeded (LOW)

07/15 22:00:38  FSAnalyze[413]             Succeeded (LOW)

07/14 22:00:30  FSAnalyze[412]             Succeeded (LOW)

07/13 22:00:28  FSAnalyze[411]             Succeeded (LOW)

205 Posts

July 20th, 2015 08:00

Were I you, I would tell them in no uncertain terms that unless they had a resolution for you immediately, they need to send you fresh SSDs with updated firmware (note that you PROBABLY won't get updated firmware on the new SSDs, but as soon as you install them, you should update their firmware.)

5 Practitioner

 • 

274.2K Posts

July 20th, 2015 09:00

Hi Bimmer,

Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0). The threshold is set to 0 and is exceeded. I think you need to update the threshold value. i am not sure where to update the value. Will let you know in some time.


Thanks,

kiran.

21 Posts

July 20th, 2015 09:00

Shane,

Thank you. I already have an SR open. We were all thinking this was a firmware issue, so they were holding off on dispatching replacement drives. But I just updated the SR with new information, so they decided that they need to be replaced.

Hopefully the new ones resolve the issue.

Thank you

104 Posts

July 20th, 2015 09:00

Bimmer,

In order to install the drive firmware on the drives the disks would need to be up and HEALTHY. So you would need to re-add the drives back to the cluster before attempting to upgrade firmware again.

We would like to send you some replacement drives regardless, as being in support we like to error on the safe side.

If you can open an SR with the following information I can ship you some new SSD's drives. As carlilek said if needed we can assist with the firmware upgrade once replaced.

Grab the Serial numbers off each node just for us to validate we're sending the correct replacement drives.

# isi_for_array -s isi_hw_status |grep SerNo

Once you have the SR number you can PM me or post it in a reply.

To create a service request, you have a couple options:

1. Log in to your online account on support.emc.com and go to this page: https://support.emc.com/servicecenter/createSR

2. Call in to EMC Isilon Support at 1-800-782-4362 (For a complete local country dial list, please see this document: http://www.emc.com/collateral/contact-us/h4165-csc-phonelist-ho.pdf)

7 Posts

July 21st, 2015 10:00

Hi Bimmmer,

There are two versions of Drive Support package 1.6, one With SSD and the second No SSD.  Can you confirm which version was installed on your cluster?

104 Posts

July 30th, 2015 13:00

Bimmer,

Wanted to check in and make sure that the Drive got on site and that the replacement of the drive allowed the firmware package to be installed successfully.

7 Posts

August 4th, 2015 07:00

Hi Bimmer,

Please reference the following KB article that was recently published on the issue you are experiencing:

https://support.emc.com/kb/205594

No Events found!

Top