Unsolved
This post is more than 5 years old
21 Posts
0
4198
Drive Firmware upgrade did not upgrade my SSD's
I have a new, pre-production cluster of X410's. Support had me update OneFS to 7.1.1.4, then Node Firmware package 9.2.3, then Drive Firmware support package 1.6. All of the updates completed successfully, but it did not update the firmware on my SSD drives in Bay 1 of each node. The cluster wants the drives replaced, and keeps sending alerts to Isilon support. For each node I have the alert "The drive in Bay 1 has firmware version A100 which does not match the configuration installed on the cluster. Please install the appropriate Drive Support Package and update firmware"
Support had me power down the cluster, the bring it back up and try to format those drives. That didn't work. They had me reinstall the Drive firmware package, that didn't work either. (I have verified that it is not a "No SSD" firmware package)
In the About this Cluster section, there are alerts for Cluster Firmware CMCSDR_Honeybadger, CMC_HFHB, and IsilonFPV1.
Anybody have any ideas of how I can get these drives back online, and get their firmware updated?
Thank you
carlilek
205 Posts
0
July 20th, 2015 08:00
So it actually smartfailed all the SSDs? The alert is generally something ignorable, and that typically won't trigger a flexprotect on those drives.
Just to check, you're not using L3, right?
carlilek
205 Posts
0
July 20th, 2015 08:00
Given that it's a wear threshold exceeded, I'd say they owe you some new SSDs.
L3 is a mode where the SSDs are used as an evacuation cache for the RAM rather than exclusively for metadata acceleration.
Can you post an isi devices -n and an isi status -v for your cluster?
Barnzy
21 Posts
0
July 20th, 2015 08:00
So, I'm wondering why the Drive firmware package didn't update them, and will it update new ones? Do I try Drive firmware update 1.7? They all showed up as failed, simultaneously, right after the firmware updates were installed. So, I don't think they are actually failed drives. Is there any way to force a firmware update of those drives? They are the only SSD's in the cluster.
Barnzy
21 Posts
0
July 20th, 2015 08:00
I'm not sure what L3 is. Isilon is brand new to us.
These are the alerts it's sending to support, for all 6 nodes.
Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0). Drive is being removed from the system - please schedule drive replacement immediately.
Barnzy
21 Posts
0
July 20th, 2015 08:00
All 6 nodes look exactly like this.
Node 1, [ATTN]
Bay 1 Lnum 43 [REPLACE] SN:0RV79ZYA /dev/da19
Bay 2 Lnum 16 [HEALTHY] SN:PN2134P6KRX86X /dev/da20
Bay 3 Lnum 34 [HEALTHY] SN:PN2134P6KSLLBX /dev/da1
Bay 4 Lnum 33 [HEALTHY] SN:PN1134P6KTGXHW /dev/da2
Bay 5 Lnum 38 [HEALTHY] SN:PN2134P6KSV80X /dev/da21
Bay 6 Lnum 32 [HEALTHY] SN:PN1134P6KSWPLW /dev/da3
Bay 7 Lnum 31 [HEALTHY] SN:PN2134P6KSMR2X /dev/da4
Bay 8 Lnum 30 [HEALTHY] SN:PN1134P6KTGYWW /dev/da5
Bay 9 Lnum 14 [HEALTHY] SN:PN2134P6KSSG7X /dev/da22
Bay 10 Lnum 29 [HEALTHY] SN:PN1134P6KTLA7W /dev/da6
Bay 11 Lnum 28 [HEALTHY] SN:PN2134P6KSLDZX /dev/da7
Bay 12 Lnum 27 [HEALTHY] SN:PN1134P6KTL3WW /dev/da8
Bay 13 Lnum 13 [HEALTHY] SN:PN2134P6KT75TX /dev/da23
Bay 14 Lnum 12 [HEALTHY] SN:PN2134P6KRX7KX /dev/da24
Bay 15 Lnum 11 [HEALTHY] SN:PN2134P6KSV7SX /dev/da25
Bay 16 Lnum 26 [HEALTHY] SN:PN1134P6KTGYLW /dev/da9
Bay 17 Lnum 10 [HEALTHY] SN:PN2134P6KSV4XX /dev/da26
Bay 18 Lnum 9 [HEALTHY] SN:PN2134P6KSUM9X /dev/da27
Bay 19 Lnum 8 [HEALTHY] SN:PN2134P6KSUPWX /dev/da28
Bay 20 Lnum 25 [HEALTHY] SN:PN1134P6KTL6SW /dev/da10
Bay 21 Lnum 7 [HEALTHY] SN:PN2134P6KSSE9X /dev/da29
Bay 22 Lnum 6 [HEALTHY] SN:PN2134P6KRYLGX /dev/da30
Bay 23 Lnum 24 [HEALTHY] SN:PN2134P6KSURSX /dev/da11
Bay 24 Lnum 36 [HEALTHY] SN:PN2134P6KSMSHX /dev/da12
Bay 25 Lnum 22 [HEALTHY] SN:PN1134P6KSWGMW /dev/da13
Bay 26 Lnum 21 [HEALTHY] SN:PN2134P6KRX7AX /dev/da14
Bay 27 Lnum 20 [HEALTHY] SN:PN1134P6KSWE2W /dev/da15
Bay 28 Lnum 5 [HEALTHY] SN:PN2134P6KSUKTX /dev/da31
Bay 29 Lnum 35 [HEALTHY] SN:PN1134P6KSWNEW /dev/da16
Bay 30 Lnum 4 [HEALTHY] SN:PN1134P6KSWPVW /dev/da32
Bay 31 Lnum 19 [HEALTHY] SN:PN1134P6KSWNYW /dev/da17
Bay 32 Lnum 3 [HEALTHY] SN:PN2134P6KRX6SX /dev/da33
Bay 33 Lnum 18 [HEALTHY] SN:PN1134P6KSWR7W /dev/da18
Bay 34 Lnum 2 [HEALTHY] SN:PN1134P6KSWD9W /dev/da34
Bay 35 Lnum 1 [HEALTHY] SN:PN2134P6KSMSRX /dev/da35
Bay 36 Lnum 0 [HEALTHY] SN:PN2134P6KSMSVX /dev/da36
Cluster Name: Isilon-XXXXXXX-Cluster01
Cluster Health: [ ATTN]
Cluster Storage: HDD SSD Storage
Size: 375T (379T Raw) 0 (0 Raw)
VHS Size: 4.0T
Used: 28G (< 1%) 0 (n/a)
Avail: 375T (> 99%) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR | In Out Total| Used / Size |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
1|10.105.32.131 |-A-- | 0| 512| 512| 3.7G/ 62T(< 1%)|(No Storage SSDs)
2|10.105.32.129 |-A-- | 22K| 336K| 358K| 5.9G/ 62T(< 1%)|(No Storage SSDs)
3|10.105.32.122 |-A-- | 0| 30| 30| 2.9G/ 62T(< 1%)|(No Storage SSDs)
4|10.105.32.123 |-A-- | 0| 134K| 134K| 4.6G/ 62T(< 1%)|(No Storage SSDs)
5|10.105.32.124 |-A-- | 0| 224K| 224K| 4.6G/ 62T(< 1%)|(No Storage SSDs)
6|10.105.32.128 |-A-- | 0| 224K| 224K| 6.1G/ 62T(< 1%)|(No Storage SSDs)
-------------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals: | 22K| 919K| 941K| 28G/ 375T(< 1%)|(No Storage SSDs)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
Critical Events:
07/17 14:48 3 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...
07/17 14:48 6 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...
07/17 14:48 2 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...
07/17 14:48 4 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...
07/17 14:49 5 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...
07/17 14:49 1 Recurring: Disk Repair Complete: Bay 1, Type SSD, LNUM 17. ...
07/17 14:49 4 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...
07/17 14:59 5 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...
07/20 09:08 1 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...
07/20 09:08 2 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...
07/20 09:08 6 Recurring: One or more drives (bay(s) 1 / type(s) SSD) are ...
07/20 09:08 3 One or more drives (bay(s) 1 / type(s) SSD) are ready to be...
07/20 09:13 2 Recurring: Bay 1 System Area Wear threshold exceeded: 1 (Th...
07/20 09:14 3 Recurring: Bay 1 System Area Wear threshold exceeded: 1 (Th...
07/20 09:16 4 Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0)...
07/20 09:18 6 Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0)...
07/20 09:22 5 Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0)...
07/20 09:33 1 Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0)...
Cluster Job Status:
No running jobs.
No paused or waiting jobs.
No failed jobs.
Recent job results:
Time Job Event
--------------- -------------------------- ------------------------------
07/19 22:00:26 FSAnalyze[418] Succeeded (LOW)
07/19 00:00:20 ShadowStoreDelete[417] Succeeded (LOW)
07/18 22:00:21 FSAnalyze[416] Succeeded (LOW)
07/17 22:00:48 FSAnalyze[415] Succeeded (LOW)
07/16 22:00:40 FSAnalyze[414] Succeeded (LOW)
07/15 22:00:38 FSAnalyze[413] Succeeded (LOW)
07/14 22:00:30 FSAnalyze[412] Succeeded (LOW)
07/13 22:00:28 FSAnalyze[411] Succeeded (LOW)
carlilek
205 Posts
0
July 20th, 2015 08:00
Were I you, I would tell them in no uncertain terms that unless they had a resolution for you immediately, they need to send you fresh SSDs with updated firmware (note that you PROBABLY won't get updated firmware on the new SSDs, but as soon as you install them, you should update their firmware.)
Anonymous
5 Practitioner
5 Practitioner
•
274.2K Posts
0
July 20th, 2015 09:00
Hi Bimmer,
Bay 1 System Area Wear threshold exceeded: 1 (Threshold: 0). The threshold is set to 0 and is exceeded. I think you need to update the threshold value. i am not sure where to update the value. Will let you know in some time.
Thanks,
kiran.
Barnzy
21 Posts
0
July 20th, 2015 09:00
Shane,
Thank you. I already have an SR open. We were all thinking this was a firmware issue, so they were holding off on dispatching replacement drives. But I just updated the SR with new information, so they decided that they need to be replaced.
Hopefully the new ones resolve the issue.
Thank you
Stdekart
104 Posts
0
July 20th, 2015 09:00
Bimmer,
In order to install the drive firmware on the drives the disks would need to be up and HEALTHY. So you would need to re-add the drives back to the cluster before attempting to upgrade firmware again.
We would like to send you some replacement drives regardless, as being in support we like to error on the safe side.
If you can open an SR with the following information I can ship you some new SSD's drives. As carlilek said if needed we can assist with the firmware upgrade once replaced.
Grab the Serial numbers off each node just for us to validate we're sending the correct replacement drives.
# isi_for_array -s isi_hw_status |grep SerNo
Once you have the SR number you can PM me or post it in a reply.
To create a service request, you have a couple options:
1. Log in to your online account on support.emc.com and go to this page: https://support.emc.com/servicecenter/createSR
2. Call in to EMC Isilon Support at 1-800-782-4362 (For a complete local country dial list, please see this document: http://www.emc.com/collateral/contact-us/h4165-csc-phonelist-ho.pdf)
TanyaLB
7 Posts
0
July 21st, 2015 10:00
Hi Bimmmer,
There are two versions of Drive Support package 1.6, one With SSD and the second No SSD. Can you confirm which version was installed on your cluster?
Stdekart
104 Posts
0
July 30th, 2015 13:00
Bimmer,
Wanted to check in and make sure that the Drive got on site and that the replacement of the drive allowed the firmware package to be installed successfully.
TanyaLB
7 Posts
0
August 4th, 2015 07:00
Hi Bimmer,
Please reference the following KB article that was recently published on the issue you are experiencing:
https://support.emc.com/kb/205594