Extremely high CPU usage on Linux during iSCSI write to EQL box

Question

Hi:

I just sorted a large file creating 26GB of output. During the final phase of the sort, when writing the output file, our CPU utilization went way up and the system basically froze for all other usage. It was so busy processing the iSCSI writes that nothing else useful got done.

We have seen the same behavior when unpacking a large tar file (tar -zxvf file.tar.gz).

Has anyone else ever experienced this? Any ideas how to keep it from happening?

System is a Dell 1950 Generation II with dual quad core 2.6GHz processors and 20GB of RAM. Network card is Intel Pro/1000 Quad port. Linux version is CentOS 5.4 (kernel 2.6.18-164.6.1.el5) x86_64. iscsi-initiator-utils is 6.2.0.872-13.el5. Multipath is configured over all 4 network ports. MTU is 9000. Multipath is configured with rr_min_io = 512. File system is OCFS2. EQL box is PS3900XV, attached via a pair of stacked Dell 6248 switches. Switch firmware is 3.3.3. EQL firmware is 5.2.2.

ethtool commands for the iscsi network ports are:

ethtool -A rx on tx on
ethtool -G rx 4096 tx 4096
ethtool -K gso on

What other information can I give that will help diagnose?

Thanks in advance...

Eric Raskin

eraskin · Answer

High RR_MIN_IO was based on the suggestion in the docs that large sequential access (both read and write) benefits if you have a larger RR_MIN_IO. Smaller RR_MIN_IO is better for random operations. Doesn't it have something to do with readahead settings? I have also entered the following in sysctl.conf (left out of my original post :-( ). Note that we run an Oracle database on this server, so some of these settings are based on Oracle recommendations:

# Network TCP Settings

net.core.rmem_max = 1073741824

net.core.wmem_max = 1073741824

# increase Linux autotuning TCP buffer limits

# min, default, and max number of bytes to use

# set max to at least 4MB, or higher if you use very high BDP paths

net.ipv4.tcp_rmem = 1048576 16777216 1073741824

net.ipv4.tcp_wmem = 1048576 16770216 1073741824

net.ipv4.tcp_mem = 1048576 16770216 1073741824

# don't cache ssthresh from previous connection

net.ipv4.tcp_no_metrics_save = 1

net.ipv4.tcp_moderate_rcvbuf = 1

# recommended to increase this for 1000 BT or higher

net.core.netdev_max_backlog = 2500

#Equallogic settings

net.ipv4.conf.all.arp_ignore=1

net.ipv4.conf.all.arp_announce=2

net.ipv4.netfilter.ip_conntrack_tcp_be_liberal=1

eraskin · Answer

Yep -- yet more info I forgot to include.... My iscsid.conf (and all connected nodes) have been updated to:

node.session.cmds_max=1024

node.seesion.queue_depth=128

node.session.iscsi.FastAbort=No

node.session.iscsi.FirstBurstLength=16776192

node.session.iscsi.MaxBurstLength=16776192

node.conn[0].iscsi.MaxRecvDataSegment=16776192

node.conn[0].iscsi.FirstBurstLength=16776192

node.conn[0].iscsi.MaxBurstLength=16776192

node.conn[0].iscsi.MaxXmitDataSegmentLength=16776192

I also have this as part of rc.local:

# Change all Equallogic devices to noop scheduler -- let the EQL box do the scheduling

for i in `ls -d /sys/block/sd*`

do

if [ `cat $i/device/vendor` == "EQLOGIC" ]

then

echo noop >$i/queue/scheduler

fi

done

So I have NOOP as the I/O scheduler for all Equallogic devices.

I can reduce RR_MIN_IO, but how will that help the main problem - high CPU usage during large sequential writes?

Anything else we can consider?

eraskin · Answer

OK, but it isn't an Oracle-related issue. I was sorting a file. It was NOT in the Oracle database. It was entirely iSCSI I/O to the Equallogic box that swamped the system.

However, I will attempt other avenues.

Thanks for the attempt!

eraskin · Answer

Thanks. I will try. Actually, I think it has to do with the InterruptThrottleRate setting on my Pro/1000 Controller. I think my CPU is getting swamped with Interrupts. I will be trying different values to see if that fixes it. I know that throttling the interrupts will increase latency, but I can live with that if my system doesn't slow to a crawl. ;-)

We are using a package called asksort (not Unix sort). We use OCFS2 because we expect to cluster this file system so that multiple systems can access our working data at the same time. I thought it would be better than NFS with no single point of failure. Silly me. :-)

On a slightly different tack, do you know what everyone seems to recommend turning off offload processing on iSCSI connections? We have tso (tcp segmentation offload) off and gso (generic segmentation offload) on. Is there any disadvantage to turning tso on? What about gro (generic receive offload), which is also currently off? I'm not 100% sure what they even do, but it seems that asking the hardware to do more work should be advantageous??

eraskin · Answer

Maybe the Interrupt Throttle option should be added to your Linux install documents?  I know it doesn't affect everyone, and it is dependent on the ethernet driver in use, but I would expect there are a fair number of people using Intel cards out there.  I wonder what percentage of people actually run into this?  I haven't found it mentioned much on the web.

EqualLogic

Extremely high CPU usage on Linux during iSCSI write to EQL box

Was this post helpful?

Register Now! for Secure Secrets with Kubernetes & DevSecOps - and grab a lab!