Unsolved
This post is more than 5 years old
7 Posts
1
6112
[VNX 5400] Unused disks in a storage pool
Hi there,
My company has purchased an EMC VNX 5400 appliance two months ago. We have created a storage pool with deduplicated LUN.
After migrating data to this new pool we are facing performance issues. I have analyzed the performance via the VNX monitoring tool (and also Unisphere Analyzer) and I have noticed that :
- 3 out of 5 Flash drives (Extreme Performance) are seen as "unused" (0% utilization and 0 IOPS)
- 4 out of 5 SAS drives (Performance) are displayed as "unused"
- 8 NL-SAS drives (Capacity) are all intensively used
Then we have added 5 new SAS drives to the pool and the activity spread out across the added disks. The workload is getting lower over the NL-SAS drives ... however the initial "unused" drives still remain "unused".
I contacted the EMC support few weeks ago, they asked us some SP collect and NAR files but nothing more, the service request is still open ...
I just want to know if one of you have already faced this issue and have an idea about this improper working. To avoid again performance issue we have suspended the data migration until a reliable solution can be found.
For now we don't have enough left drives to create a new pool and migrate the datas from the current pool to the new one. In this case we would have been able to check the "failed" drives.
Thank for your help.
sk_
104 Posts
0
May 12th, 2015 23:00
unused drives: Do you have FastVP enabled on the VNX? Is auto-tiering scheduled to run daily on Pool 0?
performance: disable the deduplication, it is slow even when you have more disks. We experienced as much as 10x performance hit when using deduplication. There is also a bug where Dedup prevents the auto-tiering to work properly and balance the workloads on the drives. Disabling dedup will solve this too.
-sk
brettesinclair
2 Intern
2 Intern
•
715 Posts
1
May 13th, 2015 02:00
Very strange indeed. First thing you should do is escalate your SR.
I agree with Sami also, disable dedup. It's proving to be problematic in some scenrios.
If you have naviseccli, i'd be interested to see the output of some of the disks stats, for example;
naviseccli -h array-spa getdisk 0_0_8
Desperadeo1
7 Posts
0
May 13th, 2015 21:00
Thanks for your replies.
A SR is open since 16/03/2015 and the EMC support can't still bring an explanation about this issue ...
We have FastVP enabled on the VNX and the auto-tiering is scheduled to run daily on Pool 0 (from midnight to 8.00 am).
But the SP report shows that the auto-tiering process is not able to relocate all the slices and lots of them fall in a "failed" status.
Finally we have decided to create a new pool without enabling the deduplication (EMC will send us new SSD/SAS drives). The benefit with the deduplication is about 50% of space gain but it isn't worth to keep it because the impact performance is too terrible (service outages) for the moment.
Here is the output (collected a month ago) from NavisecCLI of 2 disks of the same tier in the this Pool 0
C:\Program Files\EMC\Navisphere CLI>naviseccli -h array-spa getdisk 0_0_12
Bus 0 Enclosure 0 Disk 12
Vendor Id: SEAGATE
Product Id: ST900MM0 CLAR900
Product Revision: LS1C
Lun: Unbound
Type: N/A
State: Enabled
Hot Spare: N/A
Prct Rebuilt: Unbound
Prct Bound: Unbound
Serial Number: S0N2EXQH
Sectors: N/A
Capacity: 840313
Private: Unbound
Bind Signature: N/A, 0, 12
Hard Read Errors: 0
Hard Write Errors: 0
Soft Read Errors: 0
Soft Write Errors: 0
Read Retries: N/A
Write Retries: N/A
Remapped Sectors: N/A
Number of Reads: 6641
Number of Writes: 4302
Number of Luns: 0
Raid Group ID: N/A
Clariion Part Number: DG118033067
Request Service Time: N/A
Read Requests: 6641
Write Requests: 4302
Kbytes Read: 405246
Kbytes Written: 247277
Stripe Boundary Crossing: None
Drive Type: SAS
Clariion TLA Part Number:005050212PWR
User Capacity: 0
Idle Ticks: 50696961
Busy Ticks: 55298479
Current Speed: 6Gbps
Maximum Speed: 6Gbps
C:\Program Files\EMC\Navisphere CLI>naviseccli -h array-spa getdisk 0_0_8
Bus 0 Enclosure 0 Disk 8
Vendor Id: HITACHI
Product Id: HUC10909 CLAR900
Product Revision: C430
Lun: Unbound
Type: N/A
State: Enabled
Hot Spare: N/A
Prct Rebuilt: Unbound
Prct Bound: Unbound
Serial Number: KXJ878GR
Sectors: N/A
Capacity: 840313
Private: Unbound
Bind Signature: N/A, 0, 8
Hard Read Errors: 0
Hard Write Errors: 0
Soft Read Errors: 0
Soft Write Errors: 0
Read Retries: N/A
Write Retries: N/A
Remapped Sectors: N/A
Number of Reads: 0
Number of Writes: 0
Number of Luns: 0
Raid Group ID: N/A
Clariion Part Number: DG118033034
Request Service Time: N/A
Read Requests: 0
Write Requests: 0
Kbytes Read: 0
Kbytes Written: 0
Stripe Boundary Crossing: None
Drive Type: SAS
Clariion TLA Part Number:005050349PWR
User Capacity: 0
Idle Ticks: 0
Busy Ticks: 0
Current Speed: 6Gbps
Maximum Speed: 6Gbps
sk_
104 Posts
0
May 14th, 2015 05:00
Yes, that is the known bug, DeDup processing will cause some relocations to fail, due to some data locks/contention.
We have had SR open for 9 months for this! Fix should/could be in the next code release..
You can manually help the situation by pausing deduplication on the pool during the time when relocation is done.
But you made good decision! Dedup still kills the performance, and there is also open bug where the dedup process just gets stuck on its own and you need to reboot SP to get it running again..
-sk
Desperadeo1
7 Posts
0
May 14th, 2015 12:00
a SR open for 9 months ? wow ...
We will wait for the next code release and check out if some deduplication bugs will be fixed.
If so, I might reconsider to use the deduplication only on specific dataset where the performance is not required at all.
thanks for your experience feedback.
brettesinclair
2 Intern
2 Intern
•
715 Posts
0
May 14th, 2015 22:00
9 Months is crazy without multiple escalations....time to get your TAM in for a chat !
sk_
104 Posts
0
May 14th, 2015 22:00
That sounds ok.
EMC documentation really fails to mention that how big the performance impact really is! When we got the VNX5400 boxes, we started all-in with dedup as that has worked really well with NetApp, but then we had to give it up.. only using for some archive LUNs now.
-sk
sk_
104 Posts
0
May 14th, 2015 22:00
It took a long time for engineering to track down the issue, and now it takes a long time to get a fix for it.
-sk
Desperadeo1
7 Posts
0
May 15th, 2015 00:00
Here is a report about our auto-tiering process :
Hope that it will be working well in the new non-deduplicated pool.
Sami, what is your VNX OE for block ? Our current one is 05.33.000.5.81 and there is a newest version released 3 days ago : 05.33.006.5.102
I'm just waiting for the release note to figure out what bugs are fixed before proceeding the upgrade on our appliance.
sk_
104 Posts
0
May 15th, 2015 03:00
We are running .74
I quickly looked at the release notes for .102, didn't see much fixes related.. but the release notes have been really incomplete lately - I'll check with engineering.
-sk
kelleg
4.5K Posts
0
May 15th, 2015 12:00
What is the SR# for this issue? I'll look into the issue.
This issue as described as some of the disks appear to have zero IOPS when viewed in Analyzer Real-Time or in the NAR files, is this correct?
glen
Desperadeo1
7 Posts
0
May 15th, 2015 13:00
Actually we have opened two SR (69992844 and 70832944). The last one has been recently closed for no reason although I haven't approved the explanation from support ...
When I use the Analyzer Real-Time on these "unused" disks, it also shows no activity at all. I have collected several NAR files for the support but I never have any feedback about them.
Thanks for looking into my issue.
kelleg
4.5K Posts
0
May 15th, 2015 14:00
I’ve reviewed the case and recommended that they escalate this case to engineering.
I’ve seen a couple of cases with this issue – some of the drives that are configured in a Pool are not showing IOPS in the NAR files. We have found that the problem is with the statistics in the data collection – it does not actually affect the IO to the drives, it is just not collecting the data.
On one of the other cases we captured different logs from the array and verified that IO is being sent to the disks. This is not creating a performance issue.
Glen Kelley
Technical Support Engineer
Customer Services, Unified VNX
glen.kelley@emc.com
Hours: Monday-Friday 10:00-18:00 ET
Business: 800-782-4362 ext 3291074
Direct: (774) 803-2497
Online Support: https://support.emc.com
1 Attachment
image001.jpg
Desperadeo1
7 Posts
0
May 15th, 2015 21:00
I appreciate your reactivity and pleased to know that you've already seen this kind of issue ! It's somehow comforting ...
because until now the different people (technical support for the 2 SR) I am dealing with never face this customer issue. I was even told that it is a normal behaviour because of the deduplication and if I disable it the "unused" drives would show activity ...
But how to be exactly sure that it does not affect the IO to the drives ? In the other hand I can't believe that we only use one or two disks in a RAID 4+1 tier. Let's assume that it is a data collection bug, we should conclude that our current pool can't afford the workload (heavy-consuming process) during the week-end and this leads to service outage in our production environment (vSphere platform) and having enable the deduplication process does not help at all.
To give you an idea, I've just monitored the performance IO in this pool (backup in progress) few minute ago and it is delivering > 6000 IOPS. Can you confirm me that this workload could not have been absorbed by the pool if we only have 2 SSD drives, 6 SAS drives and 8 NL-SAS drives, specifically if the accessed data blocks are mainly located in NL-SAS and SAS drives ?
sk_
104 Posts
0
May 16th, 2015 01:00
We had this scenario too, one private RG did show 0 IOPS for two disks, while three other disks did show IOPS. This issue was solved by itself after some SP reboots.
-sk