T_Koopman
1 Nickel

High Latency for SSD on XtremIO, How do you trouble shoot?

Hi all,

In my case XtremIO is used for Oracle databases and the Performance meter does go into the red for Bandwidth.

I have both FC and iSCSI connections,  the database performance in question is connected via iSCSI.

I know that should be a big red flag, but what I need to prove is that the latency peaks are because the server is over running the 10 gig iSCSI connection and the latency is related to retransmits.  Well that is my theory until proven wrong.

Any one have any idea how to detect on the XtremIO or Server if a lot of network retransmits are taking place?

10 Replies
T_Koopman
1 Nickel

Re: High Latency for SSD on XtremIO, How do you trouble shoot?

I should define high latency in this case over   13,000 us   when I look at last 24 hours I see peaks of 87,000 us.

0 Kudos
Avi3
3 Silver

Re: High Latency for SSD on XtremIO, How do you trouble shoot?

Response time shown in the GUI (and CLI) is a calculated number (instead of it being measured). The calculation is such that the number of IOPs is in the denominator. So if you have very low number if IOPs (because of the nature of the workload), you will see artificially high latency numbers. Can you check the IOPS on the array when you see those high latencies?

In any case, scenarios for network retransmit needs to be investigated by taking traces. And performance analysis like this needs to have information about the workload that you are running. Have you tried opening an SR with the Support team?

0 Kudos
T_Koopman
1 Nickel

Re: High Latency for SSD on XtremIO, How do you trouble shoot?

I can try opening an SR.  I will look at IOPS, next time I see high latencies.  Can you give me an idea of what would be considered low IOPs?    I just checked and I have a latency spike just over 18,000 us.  And the IOPS at the time was over  15,000 IOPs.

0 Kudos
Avi3
3 Silver

Re: High Latency for SSD on XtremIO, How do you trouble shoot?

18,000 us = 18ms of latency with 15,000 IOPs. The performance characteristic will depend on a lot of factors like:

  • Read/write mix
  • IO size
  • Concurrency
  • Number of X-Bricks

I would recommend discussing this with your account team who would have the necessary expertise to look at your workload and give you an idea of the expected performance from your XtremIO array.

0 Kudos
Aglidic
1 Nickel

Re: High Latency for SSD on XtremIO, How do you trouble shoot?

Your oracle databases is on a physical or virtual server?

Do you have any storage virtualisation like vplex?

The database is on linux or windows?

Have you check the latency on the client side?

How is the multipathing? Did you use non routable vlan? What can of switches?

The xtremIO recommandations have been applied on the client side?

If it's a network issue it will be easier to debug it on the client side.

_dev_urandom
1 Copper

Re: High Latency for SSD on XtremIO, How do you trouble shoot?

I'm running into what seems like the same type of issue.  Our setup is Oracle 12c grid + 11g home + OL6 + VMWare 6 + Cisco UCS.   Issue seems to be related to ingress tcp congestion based on what I can see from the amount of retransmits from the xbricks using tcpdump/tshark on the OL6 host.

Using btest or vdbench with 32 threads and a 128k block size against a single volume on a 20TB Xbrick reports around 900MB/s throughput with around 2ms latency reported by iostat.   In Oracle I'm using ASM normal redundancy to mirror 2 volumes on separate XBrick clusters. 

Performing a large table scan or rman backup will produce 256k-512k block reads and I'll see a maximum read throughput of around 150MB/s per volume and latency around 10 - 15ms which obviously isn't acceptable.  

Disabling tcp_window_scaling improves thorughput to around 500MB/s per volume though degrades NFS throughput back to my NetApp filers.  Increasing tcp_rmem only degrades the issue further so it seems like a buffer issue elsewhere. 

It has also been found that by reducing the iSCSI sessions to one per Linux 10GbE interface restores the throughput to acceptable levels though this throws red flags as having a single 10GbE connection to each cluster isn't a good idea either.


SR has been opened up for the past few months and I'd assume that we have gone through all best-practice items.  Next steps are trying to attempt a multi-point packet capture to try and narrow down the device or interface that is dropping packets.  

T_Koopman
1 Nickel

Re: High Latency for SSD on XtremIO, How do you trouble shoot?

urandom,

Our configurations do have pieces that are common to each other.  Thank you for your input.

0 Kudos
T_Koopman
1 Nickel

Re: High Latency for SSD on XtremIO, How do you trouble shoot?

Aglidic you asked.

Your oracle databases is on a physical or virtual server? physical

Do you have any storage virtualisation like vplex?  No

The database is on linux or windows? Linux

Have you check the latency on the client side? I will try.  It is hard to pinpoint, because they are short blips but many.

How is the multipathing? Powerpath.

Did you use non routable vlan? Yes.

What can of switches?  Nexus7000.

The xtremIO recommandations have been applied on the client side?  I believe so, but if you have a specific doc I should reference, I will review client settings.

0 Kudos
Avi3
3 Silver

Re: High Latency for SSD on XtremIO, How do you trouble shoot?

You can find the XtremIO recommended settings in XtremIO 2.2.x - 4.0.1 Host Configuration Guide<https://support.emc.com/docu56210_XtremIO_2.2.x_-_4.0.1_Host_Configuration_Guide.pdf?language=en_US&language=en_US>

0 Kudos