Desperadeo1

7 Posts

6112

May 11th, 2015 22:00

[VNX 5400] Unused disks in a storage pool

Hi there,

My company has purchased an EMC VNX 5400 appliance two months ago. We have created a storage pool with deduplicated LUN.

After migrating data to this new pool we are facing performance issues. I have analyzed the performance via the VNX monitoring tool (and also Unisphere Analyzer) and I have noticed that :

- 3 out of 5 Flash drives (Extreme Performance) are seen as "unused" (0% utilization and 0 IOPS)

- 4 out of 5 SAS drives (Performance) are displayed as "unused"

- 8 NL-SAS drives (Capacity) are all intensively used

Then we have added 5 new SAS drives to the pool and the activity spread out across the added disks. The workload is getting lower over the NL-SAS drives ... however the initial "unused" drives still remain "unused".

I contacted the EMC support few weeks ago, they asked us some SP collect and NAR files but nothing more, the service request is still open ...

I just want to know if one of you have already faced this issue and have an idea about this improper working. To avoid again performance issue we have suspended the data migration until a reliable solution can be found.

For now we don't have enough left drives to create a new pool and migrate the datas from the current pool to the new one. In this case we would have been able to check the "failed" drives.

Thank for your help.

Responses(19)

S

sk_

104 Posts

0

May 12th, 2015 23:00

unused drives: Do you have FastVP enabled on the VNX? Is auto-tiering scheduled to run daily on Pool 0?

performance: disable the deduplication, it is slow even when you have more disks. We experienced as much as 10x performance hit when using deduplication. There is also a bug where Dedup prevents the auto-tiering to work properly and balance the workloads on the drives. Disabling dedup will solve this too.

-sk

brettesinclair

2 Intern

•

715 Posts

1

May 13th, 2015 02:00

Very strange indeed. First thing you should do is escalate your SR.

I agree with Sami also, disable dedup. It's proving to be problematic in some scenrios.

If you have naviseccli, i'd be interested to see the output of some of the disks stats, for example;

naviseccli -h array-spa getdisk 0_0_8

Desperadeo1

7 Posts

0

May 13th, 2015 21:00

Thanks for your replies.

A SR is open since 16/03/2015 and the EMC support can't still bring an explanation about this issue ...

We have FastVP enabled on the VNX and the auto-tiering is scheduled to run daily on Pool 0 (from midnight to 8.00 am).

But the SP report shows that the auto-tiering process is not able to relocate all the slices and lots of them fall in a "failed" status.

Finally we have decided to create a new pool without enabling the deduplication (EMC will send us new SSD/SAS drives). The benefit with the deduplication is about 50% of space gain but it isn't worth to keep it because the impact performance is too terrible (service outages) for the moment.

Here is the output (collected a month ago) from NavisecCLI of 2 disks of the same tier in the this Pool 0

C:\Program Files\EMC\Navisphere CLI>naviseccli -h array-spa getdisk 0_0_12

Bus 0 Enclosure 0 Disk 12

Vendor Id: SEAGATE

Product Id: ST900MM0 CLAR900

Product Revision: LS1C

Lun: Unbound

Type: N/A

State: Enabled

Hot Spare: N/A

Prct Rebuilt: Unbound

Prct Bound: Unbound

Serial Number: S0N2EXQH

Sectors: N/A

Capacity: 840313

Private: Unbound

Bind Signature: N/A, 0, 12

Hard Read Errors: 0

Hard Write Errors: 0

Soft Read Errors: 0

Soft Write Errors: 0

Read Retries: N/A

Write Retries: N/A

Remapped Sectors: N/A

Number of Reads: 6641

Number of Writes: 4302

Number of Luns: 0

Raid Group ID: N/A

Clariion Part Number: DG118033067

Request Service Time: N/A

Read Requests: 6641

Write Requests: 4302

Kbytes Read: 405246

Kbytes Written: 247277

Stripe Boundary Crossing: None

Drive Type: SAS

Clariion TLA Part Number:005050212PWR

User Capacity: 0

Idle Ticks: 50696961

Busy Ticks: 55298479

Current Speed: 6Gbps

Maximum Speed: 6Gbps

C:\Program Files\EMC\Navisphere CLI>naviseccli -h array-spa getdisk 0_0_8

Bus 0 Enclosure 0 Disk 8

Vendor Id: HITACHI

Product Id: HUC10909 CLAR900

Product Revision: C430

Lun: Unbound

Type: N/A

State: Enabled

Hot Spare: N/A

Prct Rebuilt: Unbound

Prct Bound: Unbound

Serial Number: KXJ878GR

Sectors: N/A

Capacity: 840313

Private: Unbound

Bind Signature: N/A, 0, 8

Hard Read Errors: 0

Hard Write Errors: 0

Soft Read Errors: 0

Soft Write Errors: 0

Read Retries: N/A

Write Retries: N/A

Remapped Sectors: N/A

Number of Reads: 0

Number of Writes: 0

Number of Luns: 0

Raid Group ID: N/A

Clariion Part Number: DG118033034

Request Service Time: N/A

Read Requests: 0

Write Requests: 0

Kbytes Read: 0

Kbytes Written: 0

Stripe Boundary Crossing: None

Drive Type: SAS

Clariion TLA Part Number:005050349PWR

User Capacity: 0

Idle Ticks: 0

Busy Ticks: 0

Current Speed: 6Gbps

Maximum Speed: 6Gbps

S

sk_

104 Posts

0

May 14th, 2015 05:00

desperadeo wrote:

Thanks for your replies.

A SR is open since 16/03/2015 and the EMC support can't still bring an explanation about this issue ...

We have FastVP enabled on the VNX and the auto-tiering is scheduled to run daily on Pool 0 (from midnight to 8.00 am).

But the SP report shows that the auto-tiering process is not able to relocate all the slices and lots of them fall in a "failed" status.

Yes, that is the known bug, DeDup processing will cause some relocations to fail, due to some data locks/contention.

We have had SR open for 9 months for this! Fix should/could be in the next code release..

You can manually help the situation by pausing deduplication on the pool during the time when relocation is done.

But you made good decision! Dedup still kills the performance, and there is also open bug where the dedup process just gets stuck on its own and you need to reboot SP to get it running again..

-sk

Desperadeo1

7 Posts

0

May 14th, 2015 12:00

a SR open for 9 months ? wow ...

We will wait for the next code release and check out if some deduplication bugs will be fixed.

If so, I might reconsider to use the deduplication only on specific dataset where the performance is not required at all.

thanks for your experience feedback.

brettesinclair

2 Intern

•

715 Posts

0

May 14th, 2015 22:00

9 Months is crazy without multiple escalations....time to get your TAM in for a chat !

S

sk_

104 Posts

0

May 14th, 2015 22:00

desperadeo wrote:

We will wait for the next code release and check out if some deduplication bugs will be fixed.

If so, I might reconsider to use the deduplication only on specific dataset where the performance is not required at all.

That sounds ok.

EMC documentation really fails to mention that how big the performance impact really is! When we got the VNX5400 boxes, we started all-in with dedup as that has worked really well with NetApp, but then we had to give it up.. only using for some archive LUNs now.

-sk

S

sk_

104 Posts

0

May 14th, 2015 22:00

Brett@S wrote:

9 Months is crazy without multiple escalations....time to get your TAM in for a chat !

It took a long time for engineering to track down the issue, and now it takes a long time to get a fix for it.

-sk

Desperadeo1

7 Posts

0

May 15th, 2015 00:00

Here is a report about our auto-tiering process :

EMC Auto-Tiering 2015.05.15.png

Hope that it will be working well in the new non-deduplicated pool.

Sami, what is your VNX OE for block ? Our current one is 05.33.000.5.81 and there is a newest version released 3 days ago : 05.33.006.5.102

I'm just waiting for the release note to figure out what bugs are fixed before proceeding the upgrade on our appliance.

S

sk_

104 Posts

0

May 15th, 2015 03:00

desperadeo wrote:

Sami, what is your VNX OE for block ? Our current one is 05.33.000.5.81 and there is a newest version released 3 days ago : 05.33.006.5.102

I'm just waiting for the release note to figure out what bugs are fixed before proceeding the upgrade on our appliance.

We are running .74

I quickly looked at the release notes for .102, didn't see much fixes related.. but the release notes have been really incomplete lately - I'll check with engineering.

-sk

kelleg

4.5K Posts

0

May 15th, 2015 12:00

What is the SR# for this issue? I'll look into the issue.

This issue as described as some of the disks appear to have zero IOPS when viewed in Analyzer Real-Time or in the NAR files, is this correct?

glen

Desperadeo1

7 Posts

0

May 15th, 2015 13:00

Actually we have opened two SR (69992844 and 70832944). The last one has been recently closed for no reason although I haven't approved the explanation from support ...

When I use the Analyzer Real-Time on these "unused" disks, it also shows no activity at all. I have collected several NAR files for the support but I never have any feedback about them.

Thanks for looking into my issue.

kelleg

4.5K Posts

0

May 15th, 2015 14:00

I’ve reviewed the case and recommended that they escalate this case to engineering.

I’ve seen a couple of cases with this issue – some of the drives that are configured in a Pool are not showing IOPS in the NAR files. We have found that the problem is with the statistics in the data collection – it does not actually affect the IO to the drives, it is just not collecting the data.

On one of the other cases we captured different logs from the array and verified that IO is being sent to the disks. This is not creating a performance issue.

Glen Kelley

Technical Support Engineer

Customer Services, Unified VNX

glen.kelley@emc.com

Hours: Monday-Friday 10:00-18:00 ET

Business: 800-782-4362 ext 3291074

Direct: (774) 803-2497

Online Support: https://support.emc.com

1 Attachment

image001.jpg

Desperadeo1

7 Posts

0

May 15th, 2015 21:00

I appreciate your reactivity and pleased to know that you've already seen this kind of issue ! It's somehow comforting ...

because until now the different people (technical support for the 2 SR) I am dealing with never face this customer issue. I was even told that it is a normal behaviour because of the deduplication and if I disable it the "unused" drives would show activity ...

But how to be exactly sure that it does not affect the IO to the drives ? In the other hand I can't believe that we only use one or two disks in a RAID 4+1 tier. Let's assume that it is a data collection bug, we should conclude that our current pool can't afford the workload (heavy-consuming process) during the week-end and this leads to service outage in our production environment (vSphere platform) and having enable the deduplication process does not help at all.

To give you an idea, I've just monitored the performance IO in this pool (backup in progress) few minute ago and it is delivering > 6000 IOPS. Can you confirm me that this workload could not have been absorbed by the pool if we only have 2 SSD drives, 6 SAS drives and 8 NL-SAS drives, specifically if the accessed data blocks are mainly located in NL-SAS and SAS drives ?

S

sk_

104 Posts

0

May 16th, 2015 01:00

glen wrote:

I’ve seen a couple of cases with this issue – some of the drives that are configured in a Pool are not showing IOPS in the NAR files. We have found that the problem is with the statistics in the data collection – it does not actually affect the IO to the drives, it is just not collecting the data.

We had this scenario too, one private RG did show 0 IOPS for two disks, while three other disks did show IOPS. This issue was solved by itself after some SP reboots.

-sk

1
2

View All

No Events found!