swadeey123_3c8c69

33 Posts

1370

January 10th, 2013 03:00

INYO upgrade issues and recover point compatibility

We had an issue when we upgraded our DR VNX from 5.31 (5.31.000.5.720) to 5.32 (5.32.000.5.011) which was being performed by EMC and we had an outage due to one of the LUN going offline from the VNX and we recieved this as RCA from EMC: (I am not posting complete RCA but the important point)

RCA of DLU 3 offline:

================

This caused a thundering herd problem where more than 41000 pallets are consumed by this event and therefore some of the other events got starved.

One such event was required to handle a trespass of a LUN and therefore caused a DU on the LUN.

This issue will be fixed in Inyo SP Futures.(No ETA)

Since Inyo has already been committed on the array, it is no longer vulnerable to the issue.

Recommendation:

=============

The flare patch on which array is currently running is susceptible to a known trespass storm issue and could cause lun to go offline. The issue is fixed in Inyo SP2(R32.015) and hence it is highly recommended that array be upgraded to R32.015 asap. Refer primus emc308914 for more information on the issue.

Can someone please help me in understanding this.

1. Whether the version 5.31 had this known issue or the version 5.32.000.5.011 is having this issue? Is this the INYO SP1?

2. Whether the INYO SP2 is (R32.015) i.e. 5.32.000.5.015?

3. Whether the currently upgraded version 5.32.000.5.011 is supported by recover point?

4. Do we have a direct upgraded from 5.31 (5.31.000.5.720) to 5.32 (5.32.000.5.015)? As we have another VNx which also needs an upgared but it fears me if i think of this, as we have major part of production running on that.

I can see a primus solution but it is not available to me, not sure why?

Responses(6)

swadeey123_3c8c69

33 Posts

0

January 10th, 2013 03:00

If the complete RCA can help you to suggest something please refer below:

RCA of DLU 3 offline:

================

When the customer committed to Inyo, all slices for DLUs had to be preallocated. When these slices are committed, an event is queued to persist the FS Object. The code only pre-allocates 1 pallet per-FS for this particular type of event. While this was OK is pre-Inyo where slice allocations only happened on a I/O and therefore were unlikely to happen more than once in a short time for the same FS, this is no longer true in this case where on the commit all slices for all DLUs had to be pre-allocated around the same time. This caused a thundering herd problem where more than 41000 pallets are consumed by this event and therefore some of the other events got starved.

One such event was required to handle a trespass of a LUN and therefore caused a DU on the LUN.

This issue will be fixed in Inyo SP Futures.(No ETA)

Since Inyo has already been committed on the array, it is no longer vulnerable to the issue.

Recommendation:

=============

The flare patch on which array is currently running is susceptible to a known trespass storm issue and could cause lun to go offline. The issue is fixed in Inyo SP2(R32.015) and hence it is highly recommended that array be upgraded to R32.015 asap. Refer primus emc308914 for more information on the issue.

Will it be possible for you to paste the primus solution or e-mail it to me as i am still not able to access and i want to have a look as i told we have one more VNX upgrade pending.

Also i heard there was a version of INYO with which Recover Point was not supported, what was that version?

Also if it is possible for you to share any link or document or something which have the information of upgrade pre-checks and benifits on and after coming @ INYO

GearoidG

251 Posts

0

January 10th, 2013 03:00

So Rel 32 P6 had serious issues with Recoverpoint see ETA emc298607

The primus that you need is in process in becoming an ETA as well so unfortunately I cannot copy and paste it yet, as it is being reviewed by our legal team before being made customer viewable

I actually found your RCA/SR in our internal database so I am currently reviewing it - it will probably take me a few hours to come back to you but I think I have all the details I need

Regards

Gearoid

GearoidG

251 Posts

0

January 10th, 2013 03:00

Hi swadeey123

So I can answer some questions straight away

1. Whether the version 5.31 had this known issue or the version 5.32.000.5.011 is having this issue? Is this the INYO SP1? - I will need to look more into the RCA of this issue

2. Whether the INYO SP2 is (R32.015) i.e. 5.32.000.5.015?

Yes it is

3. Whether the currently upgraded version 5.32.000.5.011 is supported by recover point?

It is

4. Do we have a direct upgraded from 5.31 (5.31.000.5.720) to 5.32 (5.32.000.5.015)? As we have another VNx which also needs an upgared but it fears me if i think of this, as we have major part of production running on that.

I can see a primus solution but it is not available to me, not sure why?

This primus is nearly ready for customer consumption, I expect it to be released shortly

It was a support only primus until recently.

Regards

Gearoid

swadeey123_3c8c69

33 Posts

0

January 10th, 2013 04:00

Thanks for your involvement Gearoid…

GearoidG

251 Posts

0

January 15th, 2013 14:00

So ETA EMC308914 has now been issued to explain issues that we have seen on Rel 32 below Rel 32 P15

These issues can cause (and are not limited to) Performance, Pool luns going offline, or possibly SP bugchecks

I have replied to your other questions privately

I hope this helps

Gearoid

swadeey123_3c8c69

33 Posts

0

January 17th, 2013 02:00

Thanks Gearoid. I am really happy that we have experts on ECN who do get involved to help the customers and Storage admins, i will mark this post as closed, however if you can help me with the query i had it would be of great help.

Thanks Again...

View All

No Events found!