Thank you for the explanation. I am confused by the statement "no additional risk was added by not replacing the spares immediately but waiting for the "any drive in the system has zero spare coverage" ". Aren't any drives that spare from the depleted pool at increased risk? If another drive fails that spares from that pool it has no spare and is at risk until the EMC CE comes out the next day.
No they are not at risk. When there is no permanent spare available for any drive in the array a case is generated for replacement of the failed disk drives. There can still be spares in the array that can be used as a stadnard hot spare just not a permanent spare.
Dynamic sparing is no longer an option in VMAX systems. The point of the ability of a Hot Spare (Dynamic spare) invoking is true in relation to DMX-3 and DMX-4 systems.
However the statement related to "no additional risk" was also taking into consideration VMAX systems with only permanent sparing. This was looked at very closly by the same Engineering team who calculated and recommended how many spares are needed for adequate spare coverage. Please keep in mind one of the key factors that came into play while analizing the additional potential risk of implementing Deferred Service is the overall improvement of drive reliability that has been seen in the past several years.
So, things are much clearer than they were when this thread started, but I too have a difficult time allowing the assesment of "no additional risk" in allowing a zero hot spare available state.
If there is no risk in allowing a "Zero Hot Spare available" state then why would anyone waste money on putting in hot spares in the first place?
I can understand that the difference might be minimal, but it is ABSOLUTELY greater than the risk involved if Hot Spares ARE available. I could never justify spending the money on a highly redundant highly available system like the Symmetrix and then purposely allow a state where there is no immediately available redundancy if there was an option.
I am glad to see this being reviewed again. Coincidentally I am in the process of reading "The Black Swan - The Impact of the Highly Improbable" by Nassim Taleb - http://en.wikipedia.org/wiki/Black_swan_theory.
After the previous discussions, our recommendation (which our clients have readily agreed to) has been to NOT enable deferred service. Allen W has a good point - `If there is no risk in allowing a "Zero Hot Spare available" state then why would anyone waste money on putting in hot spares in the first place?`
Following on from that, if `After very careful analysis it was found that the new process did not impose any additional risk` (quoting 2gnWoUR4Wl1186674120175), then why take the trouble to modify the microcode, so a `customer can chose (sic) to change the criteria from zero to one in which we call the spares out for replacement` ?
My main concern regarding deferral service enable is more like in Latam we certainly don't have NBD parts onsite when they come from Franklin, usually parts onsite are 1/2 week since request.
And I'm talking Mexico, where is not that bad and have warehouses all around the country. Yet I know that at most we have no more than 10 disks of a given PN on each warehouse.
So, if we wait to go onsite to perform all spare disks replacements we have the risk of just using all available parts of a given region as there are boxes with 20/30 spares configured, to say the less. And being in the hypothesis that "there is no longer available spares" for a given disk type, waiting a week or so to have parts onsite highly raises the risk of a double disk failure.
Countries in NOLA and South Cone have even worst scenarios, there are countries where there is no warehouse at all.
At least for my given accounts, I don't think this happening, I'm turning off deferral service until we can grant logistics to have larger stocks and our team to have more available CSEs to assist when these urgent SRs for many spares trigger.
Something to consider. Have a great day ahead.
Hi, I would like to clarify one point here, Deferred maintenance will not wait for the spare count ot go down to 0 before calling the CE for replacement....
There is a concept of covering spare, Every drive has a covering spare drive assigned to it, In case that drive fails, the covering spare drive will take its place, this is not a 1-1 mapping, Onec spare will cover multiple disks. When enguinity detects that any disk does not have any covering spare it will call home and generate an SR for the CE to replace all spare drives which are pending replacement.
There is an AutoPse script that runs on the service processor, whenever a drive fails the AutoPSE script collects the RMA information for the drive ie serial number, part number etc. This script check is deffered disk service is enabled and if all drives have covering spares.
Agreed on A-J. The enginuity will dispatch a spare replacement task to the CE wherein the next covering space will not available for the disks.
In plain example, you have 8 spares drives with Deferred maintenance enabled..3 of them were Not Ready. Enginuity detects disks has no covering spare, it will make an SR and dispatch a CE to replace the 3 Not Ready spare drives. It will not wait till the threshold of zero or one drive left from the spare pool.