jelucho

295 Posts

3126

July 25th, 2010 23:00

CX4-120 in analyzer what item do i need to check poor performance in METALUN

i have a Metalun with 6 lun components , the las 2 components of this Metalun, are having utilization at almost at 99%

the metalun is experience low performance accessing it from a linux server, are those two last luns components the cause

of this delaym how to prove this

thanks.

Responses(17)

weiw1

14 Posts

0

July 25th, 2010 23:00

Hi,

Can you tell me that:

1. What's your metalun type? Is it the striped metalun or concatenated metalun?

2. Was the array installed navisphere analyzer enabler license or not?

3. What's your application pattern? OLTP/backup/video image or ...? Random or sequential or mixed application?

4. Is your IT environment met EMC support matrix (including host/switch)?

Your description is general. I can't give the exact suggestion. If you have emcgrab/switch log and SPCollect log and navisphere analyzer data (performance log of the array), that will help us to address your problem.

Thks.

Rgds,

WW

J

jelucho

295 Posts

0

July 25th, 2010 23:00

1. is striped meta

2. no navisphere analyzer enabler, just start analyzer, extract .naz file, decript them to naz file, last merge them

3. Random and Sequential

4. yes the host its under matrix,

more info you want i can attachtment it, but do i have to open a SR to atatched these files?

thanks

MC.

weiw1

14 Posts

0

July 26th, 2010 00:00

Hi,

Yup. Opening a SR is the best approach to engage CLARiiON performance engineer to take a look. Of course if you can upload all necessary logs, I will help to take a look. Thks.

Best Regards,

ww

kelleg

4.5K Posts

0

July 26th, 2010 11:00

Of the six component LUNs are any in the same raid group?

In the raid groups that contain the component LUNs is the disk Total IOPS higher than the other raid groups that contain the component LUNs?

What you want to find out is if another LUN(s) in a raid group that contains the metaLUN component LUNs are overdriving the disks.

What version of FLARE are you running?

glen

J

jelucho

295 Posts

0

July 26th, 2010 14:00

flare is 04.29.000.5.006

the thing is that raidgroup 3 its utilization is higher compared to the other ones

what i am trying to understand is what is the first and last component of this metalun

if you see it belongs to RG3, could be the way the order of the components in metalun

the reason of the slow performance.

there is no failed components. read and write cache is enabled

any idea,

other thing spB has more otilization that sp A,

metalun1 belongs to spB

what else can i ckeck

thanks, mc.

kelleg

4.5K Posts

1

July 26th, 2010 15:00

In Navisphere look in the LUN Folders/MetaLUN - you'll see the metaLUN listed - open the tree and you should see Component 0 - all the component LUNs should be listed here - this is the order that they were added to the original metaLUN - right click on each component LUN to get which raid group each LUN belongs to. Then right click on each Raid Group and select Partition tab this will show you where in the raid group the different LUNs have physically been created - see if all the component LUNs are in the same place. Also, make sure that there is only one metaLUN component LUN per raid group.

If RG 3 has higher utilization than the other component LUN's raid groups, then there must be a LUN in the raid group that is getting more workload - look at each LUN to see where the load is coming from.

Ideally you should only have metaLUN component LUNs in each raid group - see the new MetaLUN White Paper in the Documents section on this forum for a more in-depth explanation of stripped metaLUN configurations.

glen

PS - if needed you can open a SR, but I believe you are on the right track - you have extra workload from outside of the metaLUN that is interferring with the metaLUN, you just need to track it down.

J

jelucho

295 Posts

0

July 26th, 2010 16:00

ok

doing load balance, i ask, does this tresspass ocurrs automatically

yes or not, and why.

By now this is my list of luns owner

a think there are a lot from the spB side, i dont know why

if at the firts time they were more or less balanced.

i think there is a little disorder here but i dont know how

to fix without impact production lun access.

in order to fix low performance access of an specific META

thanks

MC.

Jim_A1

59 Posts

1

July 26th, 2010 16:00

MC,

by default all of the components of a metalun will assume the same owner as the "head" lun of the meta, when it is created.

You cannot do anything with these individual component luns as they become private luns.

If you want to trespass the metal lun it is always done by trespassing the "head" lun of the meta.

This will trespass all of the luns in that meta to the other SP.

J

jelucho

295 Posts

0

July 26th, 2010 16:00

take a look of this

all luns belongs to the spB, why if autoassign is enable, in all of them

including the firts one are associated to spB,

can i do a manual tresspass on this meta and change the default owner?

what do you think

mc

aloski

38 Posts

1

July 26th, 2010 18:00

Hi MC,

do you see trespasses in the sp logs?

What platform OS are your hosts? Do you have VMware?

Cheers, Paolo.

J

jelucho

295 Posts

0

July 26th, 2010 20:00

check my last comments, i found something else

MC

aloski

38 Posts

0

July 26th, 2010 20:00

Hi MC,

That is a very small number of trespasses….in a Vmware environment if you had a genuine trespass issue you will see hundreds even thousands of trespass events in the sp logs.

Have you had any Flare upgrades (NDU) or SP replacements in the past? Vmware 3.5 uses its own native failover and when luns are trespassed (i.e. during an NDU), they don’t trespass back automatically. You need to restore all the luns back to their rightful owner manually.

Cheers, Paolo.

J

jelucho

295 Posts

0

July 26th, 2010 20:00

yes there are about 34 trespass

Initiator type: Clariion Open

Failover mode: 1

12 x host HP blade 460with vmware esx 3.5.0 build 153875

flare: 04.29.000.5.006

one thing i found something in the events when the 34 tresspass occurs

there were a controlled shutdown of the storage

but seems that the customer had some problems

following the procedure to shutdown.

Does this event could be the root cause of all performance issue and tresspass

and the unbalance SP default owner settings.

how to return to a normal state.

any comment about this.

MC

aloski

38 Posts

0

July 26th, 2010 21:00

Hi MC,

That is most likely what caused the trespasses and subsequently the sp’s to be un-balanced.

Unfortunately to return all trespassed Vmware luns back to their rightful owner requires a lot of manual work.

1) From naviseccli or navisphere manager find out for each lun which sp is the default owner

2) Login onto each ESX server and change the failover path for each lun back to its original path that corresponds to the default sp.

Paolo.

aloski

38 Posts

0

July 26th, 2010 22:00

MC,

An update on my previous response for step 2:

Don’t do it from the ESX servers, instead use the navisphere gui to put them back, or do a "navicli trespass mine" on each SP.

This work just as well but is so much easier.

Paolo.

1
2

View All

No Events found!