I haven't done much tuning with spinning disks but I have worked with infiniband (QDR and FDR). I have found two things that affect write performance significantly.
First is the type of storage cards.
Are you using HBA's or RAID cards? (please give manufacture and model)
If RAID, do you have the battery power packs connected with write cach on or are they in pass through?
Second is latency between nodes
Can you run the built in latency tests in scaleIO and post the results? (please run on each node)
If you are getting anything over 100 μs using infiniband is pretty bad. I have gotten 30 μs after tuning a QDR fabric with 26GBits throughput per port
Mellanox has a good set of tools that you can use in addition to iPerf and the build in scaleIO tests.
What version firmware are you running on your HCA's?
Are your HCA's a dell/hp brand cards or true mellonox? If they are dell/hp the firmware is a pain to upgrade.
Are you using a managed infiniband switch? I not, how is your seperate manager configured.
If using unmanaged switches it might be worth while to ebay them and invest in managed ones.
Below are some settings I have found to be very effective at increasing performance.
IPoIB can only handle a 4k MTU. You have to account for the 4 byte header so set it to 4092 NOT 4096
if you are hooking the nodes to an ESXi cluster increase the queue depth on all scaleIO volumes to 256
esxcli storage core device set -d eui.XXXXXXXXXXXXXXXXX -O 256
Increase queue depth on the ESXi HCAs
esxcli system module parameters set -p "lpfc0_lun_queue_depth=254 lpfc1_lun_queue_depth=254" -m lpfc
If you really want to tune your SDS's you should look into NUMA tuning. Basically lining up each HCA PCIe slot with their corresponding memory and cpu channel. I am not sure how centos handles cpu to PCIe assignment but windows completely su*ks at it. Windows nodes do a round robin by default with their RSS profiles. (Hey Microsoft, would be nice to set the RSS profile to closeststatic by default. hint hint :-)
Turn off C-States in Bios and if there is a high performance mode set it to the highest values. Some servers have a "green" low power mode
Haven't messed with the read/write cache feature in ScaleIO but I have heard nothing but praise. Load up your nodes with some more ram and flip that on (32-64gigs should do the trick).
Unless you need to don't throttling your nodes for rebuild/re-balance. Not sure how spinning disks will fair but all flash nodes can handle this without a significant drop in performance.
Hope this helps, hit me up if you need any additional info.
pawelw1
306 Posts
0
July 11th, 2016 11:00
Hi Jeff,
Can you check the messages files on the SDS's for any disk errors? Also, are all the devices healthy in the ScaleIO GUI?
Are you using some kind of a RAID controller or they are just disks attached to standard SATA controller onboard?
Can you show use 'scli --query_all" output as well?
Thanks,
Pawel
darthlebowski
4 Posts
0
January 9th, 2017 23:00
Hey Jeff,
I haven't done much tuning with spinning disks but I have worked with infiniband (QDR and FDR). I have found two things that affect write performance significantly.
First is the type of storage cards.
Are you using HBA's or RAID cards? (please give manufacture and model)
If RAID, do you have the battery power packs connected with write cach on or are they in pass through?
Second is latency between nodes
Below are some settings I have found to be very effective at increasing performance.
echo 'kernel.shmmax=16000000000' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_low_latency=0' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_slow_start_after_idle=0' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_timestamps=1' >> /etc/sysctl.conf
echo 'net.core.wmem_max=100000000' >> /etc/sysctl.conf
echo 'net.core.rmem_max=100000000' >> /etc/sysctl.conf
Hope this helps, hit me up if you need any additional info.
Tristan