Barnzy

21 Posts

4133

July 20th, 2015 08:00

Drive Firmware upgrade did not upgrade my SSD's

I have a new, pre-production cluster of X410's. Support had me update OneFS to 7.1.1.4, then Node Firmware package 9.2.3, then Drive Firmware support package 1.6. All of the updates completed successfully, but it did not update the firmware on my SSD drives in Bay 1 of each node. The cluster wants the drives replaced, and keeps sending alerts to Isilon support. For each node I have the alert "The drive in Bay 1 has firmware version A100 which does not match the configuration installed on the cluster. Please install the appropriate Drive Support Package and update firmware"

Support had me power down the cluster, the bring it back up and try to format those drives. That didn't work. They had me reinstall the Drive firmware package, that didn't work either. (I have verified that it is not a "No SSD" firmware package)

In the About this Cluster section, there are alerts for Cluster Firmware CMCSDR_Honeybadger, CMC_HFHB, and IsilonFPV1.

Anybody have any ideas of how I can get these drives back online, and get their firmware updated?

Thank you

Responses(12)

carlilek

205 Posts

0

July 20th, 2015 08:00

So it actually smartfailed all the SSDs? The alert is generally something ignorable, and that typically won't trigger a flexprotect on those drives.

Just to check, you're not using L3, right?

carlilek

205 Posts

0

July 20th, 2015 08:00

Given that it's a wear threshold exceeded, I'd say they owe you some new SSDs.

L3 is a mode where the SSDs are used as an evacuation cache for the RAM rather than exclusively for metadata acceleration.

Can you post an isi devices -n and an isi status -v for your cluster?

Barnzy

21 Posts

0

July 20th, 2015 08:00

So, I'm wondering why the Drive firmware package didn't update them, and will it update new ones? Do I try Drive firmware update 1.7? They all showed up as failed, simultaneously, right after the firmware updates were installed. So, I don't think they are actually failed drives. Is there any way to force a firmware update of those drives? They are the only SSD's in the cluster.

Barnzy

21 Posts

0

July 20th, 2015 08:00

I'm not sure what L3 is. Isilon is brand new to us.

These are the alerts it's sending to support, for all 6 nodes.

Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0). Drive is being removed from the system - please schedule drive replacement immediately.

Barnzy

21 Posts

0

July 20th, 2015 08:00

All 6 nodes look exactly like this.

Node 1, [ATTN]

Bay 1 Lnum 43 [REPLACE] SN:0RV79ZYA /dev/da19

Bay 2 Lnum 16 [HEALTHY] SN:PN2134P6KRX86X /dev/da20

Bay 3 Lnum 34 [HEALTHY] SN:PN2134P6KSLLBX /dev/da1

Bay 4 Lnum 33 [HEALTHY] SN:PN1134P6KTGXHW /dev/da2

Bay 5 Lnum 38 [HEALTHY] SN:PN2134P6KSV80X /dev/da21

Bay 6 Lnum 32 [HEALTHY] SN:PN1134P6KSWPLW /dev/da3

Bay 7 Lnum 31 [HEALTHY] SN:PN2134P6KSMR2X /dev/da4

Bay 8 Lnum 30 [HEALTHY] SN:PN1134P6KTGYWW /dev/da5

Bay 9 Lnum 14 [HEALTHY] SN:PN2134P6KSSG7X /dev/da22

Bay 10 Lnum 29 [HEALTHY] SN:PN1134P6KTLA7W /dev/da6

Bay 11 Lnum 28 [HEALTHY] SN:PN2134P6KSLDZX /dev/da7

Bay 12 Lnum 27 [HEALTHY] SN:PN1134P6KTL3WW /dev/da8

Bay 13 Lnum 13 [HEALTHY] SN:PN2134P6KT75TX /dev/da23

Bay 14 Lnum 12 [HEALTHY] SN:PN2134P6KRX7KX /dev/da24

Bay 15 Lnum 11 [HEALTHY] SN:PN2134P6KSV7SX /dev/da25

Bay 16 Lnum 26 [HEALTHY] SN:PN1134P6KTGYLW /dev/da9

Bay 17 Lnum 10 [HEALTHY] SN:PN2134P6KSV4XX /dev/da26

Bay 18 Lnum 9 [HEALTHY] SN:PN2134P6KSUM9X /dev/da27

Bay 19 Lnum 8 [HEALTHY] SN:PN2134P6KSUPWX /dev/da28

Bay 20 Lnum 25 [HEALTHY] SN:PN1134P6KTL6SW /dev/da10

Bay 21 Lnum 7 [HEALTHY] SN:PN2134P6KSSE9X /dev/da29

Bay 22 Lnum 6 [HEALTHY] SN:PN2134P6KRYLGX /dev/da30

Bay 23 Lnum 24 [HEALTHY] SN:PN2134P6KSURSX /dev/da11

Bay 24 Lnum 36 [HEALTHY] SN:PN2134P6KSMSHX /dev/da12

Bay 25 Lnum 22 [HEALTHY] SN:PN1134P6KSWGMW /dev/da13

Bay 26 Lnum 21 [HEALTHY] SN:PN2134P6KRX7AX /dev/da14

Bay 27 Lnum 20 [HEALTHY] SN:PN1134P6KSWE2W /dev/da15

Bay 28 Lnum 5 [HEALTHY] SN:PN2134P6KSUKTX /dev/da31

Bay 29 Lnum 35 [HEALTHY] SN:PN1134P6KSWNEW /dev/da16

Bay 30 Lnum 4 [HEALTHY] SN:PN1134P6KSWPVW /dev/da32

Bay 31 Lnum 19 [HEALTHY] SN:PN1134P6KSWNYW /dev/da17

Bay 32 Lnum 3 [HEALTHY] SN:PN2134P6KRX6SX /dev/da33

Bay 33 Lnum 18 [HEALTHY] SN:PN1134P6KSWR7W /dev/da18

Bay 34 Lnum 2 [HEALTHY] SN:PN1134P6KSWD9W /dev/da34

Bay 35 Lnum 1 [HEALTHY] SN:PN2134P6KSMSRX /dev/da35

Bay 36 Lnum 0 [HEALTHY] SN:PN2134P6KSMSVX /dev/da36

Cluster Name: Isilon-XXXXXXX-Cluster01

Cluster Health: [ ATTN]

Cluster Storage: HDD SSD Storage

Size: 375T (379T Raw) 0 (0 Raw)

VHS Size: 4.0T

Used: 28G (< 1%) 0 (n/a)

Avail: 375T (> 99%) 0 (n/a)

Health Throughput (bps) HDD Storage SSD Storage

-------------------+-----+-----+-----+-----+-----------------+-----------------

1|10.105.32.131 |-A-- | 0| 512| 512| 3.7G/ 62T(< 1%)|(No Storage SSDs)

2|10.105.32.129 |-A-- | 22K| 336K| 358K| 5.9G/ 62T(< 1%)|(No Storage SSDs)

3|10.105.32.122 |-A-- | 0| 30| 30| 2.9G/ 62T(< 1%)|(No Storage SSDs)

4|10.105.32.123 |-A-- | 0| 134K| 134K| 4.6G/ 62T(< 1%)|(No Storage SSDs)

5|10.105.32.124 |-A-- | 0| 224K| 224K| 4.6G/ 62T(< 1%)|(No Storage SSDs)

6|10.105.32.128 |-A-- | 0| 224K| 224K| 6.1G/ 62T(< 1%)|(No Storage SSDs)

-------------------+-----+-----+-----+-----+-----------------+-----------------

Cluster Totals: | 22K| 919K| 941K| 28G/ 375T(< 1%)|(No Storage SSDs)

Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

Critical Events:

07/17 14:48 3 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...

07/17 14:48 6 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...

07/17 14:48 2 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...

07/17 14:48 4 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...

07/17 14:49 5 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...

07/17 14:49 1 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...

07/17 14:49 4 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...

07/17 14:59 5 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...

07/20 09:08 1 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...

07/20 09:08 2 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...

07/20 09:08 6 Recurring: One or more drives (bay(s) 1 / type(s) SSD) are ...

07/20 09:08 3 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...

07/20 09:13 2 Recurring: Bay 1 System Area Wear threshold exceeded: 1 (Th...

07/20 09:14 3 Recurring: Bay 1 System Area Wear threshold exceeded: 1 (Th...

07/20 09:16 4 Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0)...

07/20 09:18 6 Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0)...

07/20 09:22 5 Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0)...

07/20 09:33 1 Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0)...

Cluster Job Status:

No running jobs.

No paused or waiting jobs.

No failed jobs.

Recent job results:

Time Job Event

--------------- -------------------------- ------------------------------

07/19 22:00:26 FSAnalyze[418] Succeeded (LOW)

07/19 00:00:20 ShadowStoreDelete[417] Succeeded (LOW)

07/18 22:00:21 FSAnalyze[416] Succeeded (LOW)

07/17 22:00:48 FSAnalyze[415] Succeeded (LOW)

07/16 22:00:40 FSAnalyze[414] Succeeded (LOW)

07/15 22:00:38 FSAnalyze[413] Succeeded (LOW)

07/14 22:00:30 FSAnalyze[412] Succeeded (LOW)

07/13 22:00:28 FSAnalyze[411] Succeeded (LOW)

carlilek

205 Posts

0

July 20th, 2015 08:00

Were I you, I would tell them in no uncertain terms that unless they had a resolution for you immediately, they need to send you fresh SSDs with updated firmware (note that you PROBABLY won't get updated firmware on the new SSDs, but as soon as you install them, you should update their firmware.)

A

Anonymous

5 Practitioner

•

274.2K Posts

0

July 20th, 2015 09:00

Hi Bimmer,

Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0). The threshold is set to 0 and is exceeded. I think you need to update the threshold value. i am not sure where to update the value. Will let you know in some time.

Thanks,

kiran.

Barnzy

21 Posts

0

July 20th, 2015 09:00

Shane,

Thank you. I already have an SR open. We were all thinking this was a firmware issue, so they were holding off on dispatching replacement drives. But I just updated the SR with new information, so they decided that they need to be replaced.

Hopefully the new ones resolve the issue.

Thank you

Stdekart

104 Posts

0

July 20th, 2015 09:00

Bimmer,

In order to install the drive firmware on the drives the disks would need to be up and HEALTHY. So you would need to re-add the drives back to the cluster before attempting to upgrade firmware again.

We would like to send you some replacement drives regardless, as being in support we like to error on the safe side.

If you can open an SR with the following information I can ship you some new SSD's drives. As carlilek said if needed we can assist with the firmware upgrade once replaced.

Grab the Serial numbers off each node just for us to validate we're sending the correct replacement drives.

# isi_for_array -s isi_hw_status |grep SerNo

Once you have the SR number you can PM me or post it in a reply.

To create a service request, you have a couple options:

1. Log in to your online account on support.emc.com and go to this page: https://support.emc.com/servicecenter/createSR

2. Call in to EMC Isilon Support at 1-800-782-4362 (For a complete local country dial list, please see this document: http://www.emc.com/collateral/contact-us/h4165-csc-phonelist-ho.pdf)

TanyaLB

7 Posts

0

July 21st, 2015 10:00

Hi Bimmmer,

There are two versions of Drive Support package 1.6, one With SSD and the second No SSD. Can you confirm which version was installed on your cluster?

Stdekart

104 Posts

0

July 30th, 2015 13:00

Bimmer,

Wanted to check in and make sure that the Drive got on site and that the replacement of the drive allowed the firmware package to be installed successfully.

TanyaLB

7 Posts

0

August 4th, 2015 07:00

Hi Bimmer,

Please reference the following KB article that was recently published on the issue you are experiencing:

https://support.emc.com/kb/205594

View All

No Events found!