Yes, these are physical boxes with FC. And no Anti-virus software...
In fact, I've noticed this problem on a production server hosting a big SQLServer database (2To). We have restored this database and we was suprised by the low write rates (4Go/Min) where we used to be at 6Go/min.... when I've enabled the write cache.. oh surprise... the write rates was ok...
I've also made some test with VMware ESXi Host, some VMs clone operation, and, with VMware, all is normal... (better without write cache)...
So this issue is "windows dependent"... but where is the tips ?
I am assuming these are physical boxes - there are some interesting things about the Windows i/O system that can cause wierd issues. In general the SSD should write as fast or faster without the write cache turned on. It will also reduce the latency (which can be big with certain applications).
You could try something with a Linux server I guess - or get Condusiv's Velocity demo and see if that makes a difference. I would also check for virus scanners and other things that can drag down I/O. I sometimes recommend people make a RAM disk on the Windows server in question and see if performance increases between the RAM disk and disk. If not there is generally something effecting the whole I/O system.
The IOmeter not being slow could mean you have all sorts of trapped potential in those servers :)
Go into the Windows Device manager and check the properties of each of the LUNs. Make sure you check the advanced settings in particular to see if you are caching all your writes through the OS.
Also - I would probably pin the MPIO setting to a single port for this type of test. That way you can make it easier to see if there is a FC related issue. Put each lun on a different port.
Summary of the situation: (just to keep it straight in my head)
You have two physical Windows machines - one Windows 2003 and the other Windows 2008 R2. Both are connected via Fibre Channel to a compellent system which includes a tier of SSD. You turn off the write cache to the volume (that uses the recommended profile) so that writes would not be cached to the SSD.
You run IOMeter - its faster with the write cache off
You do file copies (between compellent LUNS) - its slower with the write cache off
You have tried tuning parameters on the HBA and in the OS - nothing seems to change the outcome. You have verified that no Antivirus or other storage effecting product is installed on the hosts. The issue does NOT occur with Linux or VMware - only windows. Compellent is at 6.3.10 (which should be fine) - note that I dont believe you get flash wear tracking until 6.4.3, so you might want to think on that.
Have you tried a copy from local storage to LUN? Does that make a difference?
Have you tried using something like FastCopy to do your copy test? Does that make a difference?
Are you seeing any errors on the FC switches? I would guess not if the IOmeter comes out fine.
Obviously not the expected behavior - unless someone else can think of something I have missed, you might want to raise a ticket with Copilot just so they can run through their checklist of common things.
"You turn off the write cache to the volume (that uses the recommended profile) so that writes would not be cached to the SSD."
--> I turn off the write cache to the destination volume (that uses the recommended profile) so that writes would not be passed to the NVRAM/Cache controller and would not be replicated to the second controller, but would be writen directly to SSD.
--> Not tried with LINUX.
"Have you tried a copy from local storage to LUN? Does that make a difference?"
--> Yes,this is it : I copy the big file a Compellent LUN and paste the file to another Compellent LUN.
"Have you tried using something like FastCopy to do your copy test? Does that make a difference?"
--> Not tried with FastCopy. But, as mentioned, a Full SQL Restore though a Symantec Backup Exex SQLServer Agent reaches to same conclusion : the restore rates is 40% slower without cache.
"Are you seeing any errors on the FC switches? I would guess not if the IOmeter comes out fine."
--> No errors on FC switches.
Yes, I will put the copilot support in charge of this problem.
Did you ever get an explanation for this problem from co-pilot? Could you post it here as we are beginning to deploy more and more flash, and I would be interested to see the explanation.
No sorry, and unfortunately, I didn't had any time left to open a case on this subject. But this is still relevant today. And as a workaround, I keep write cache active on all my Windows Servers (not on my ESXi servers that doesn't seem to be concerned).
I will open a case on Copilot and let you know the result. But I don't know when...
HMA59
24 Posts
0
April 2nd, 2014 08:00
Yes, these are physical boxes with FC. And no Anti-virus software...
In fact, I've noticed this problem on a production server hosting a big SQLServer database (2To). We have restored this database and we was suprised by the low write rates (4Go/Min) where we used to be at 6Go/min.... when I've enabled the write cache.. oh surprise... the write rates was ok...
I've also made some test with VMware ESXi Host, some VMs clone operation, and, with VMware, all is normal... (better without write cache)...
So this issue is "windows dependent"... but where is the tips ?
mtanen
118 Posts
0
April 2nd, 2014 08:00
I am assuming these are physical boxes - there are some interesting things about the Windows i/O system that can cause wierd issues. In general the SSD should write as fast or faster without the write cache turned on. It will also reduce the latency (which can be big with certain applications).
You could try something with a Linux server I guess - or get Condusiv's Velocity demo and see if that makes a difference. I would also check for virus scanners and other things that can drag down I/O. I sometimes recommend people make a RAM disk on the Windows server in question and see if performance increases between the RAM disk and disk. If not there is generally something effecting the whole I/O system.
The IOmeter not being slow could mean you have all sorts of trapped potential in those servers :)
HMA59
24 Posts
0
April 2nd, 2014 08:00
The SC8000 Firmware is 6.3.10.
And Yes, more than 4To of RAID10 SSD Free Space (this storage area has just entered the production state...).
To complete my first post, I can say that I don't notice this problem with IOMeter.
With IOMeter, Intensive Write IO on SSD is better without cache than with cache.
mtanen
118 Posts
0
April 2nd, 2014 08:00
I forgot to ask - is this FC or iSCSI and what speed?
mtanen
118 Posts
0
April 2nd, 2014 08:00
What version of the Compellent firmware are you running?
mtanen
118 Posts
0
April 2nd, 2014 08:00
Also - what does the space look like on your SSD? Do you have free space available in the RAID10 extent?
mtanen
118 Posts
0
April 2nd, 2014 09:00
What are your settings on the Windows device? Using the windows cache or direct to device? Whats the MPIO profile set to?
HMA59
24 Posts
0
April 2nd, 2014 09:00
I use a Roud Robin profile over 4 FC paths.
What do you mean by "windows cache or direct to device" ? I guess I use the default one...
mtanen
118 Posts
0
April 2nd, 2014 09:00
Go into the Windows Device manager and check the properties of each of the LUNs. Make sure you check the advanced settings in particular to see if you are caching all your writes through the OS.
Also - I would probably pin the MPIO setting to a single port for this type of test. That way you can make it easier to see if there is a FC related issue. Put each lun on a different port.
HMA59
24 Posts
0
April 2nd, 2014 09:00
Disk Write Caching is disabled. So I guess the OS doesn't cache anything...
mtanen
118 Posts
0
April 3rd, 2014 07:00
Summary of the situation: (just to keep it straight in my head)
You have two physical Windows machines - one Windows 2003 and the other Windows 2008 R2. Both are connected via Fibre Channel to a compellent system which includes a tier of SSD. You turn off the write cache to the volume (that uses the recommended profile) so that writes would not be cached to the SSD.
You run IOMeter - its faster with the write cache off
You do file copies (between compellent LUNS) - its slower with the write cache off
You have tried tuning parameters on the HBA and in the OS - nothing seems to change the outcome. You have verified that no Antivirus or other storage effecting product is installed on the hosts. The issue does NOT occur with Linux or VMware - only windows. Compellent is at 6.3.10 (which should be fine) - note that I dont believe you get flash wear tracking until 6.4.3, so you might want to think on that.
Have you tried a copy from local storage to LUN? Does that make a difference?
Have you tried using something like FastCopy to do your copy test? Does that make a difference?
Are you seeing any errors on the FC switches? I would guess not if the IOmeter comes out fine.
Obviously not the expected behavior - unless someone else can think of something I have missed, you might want to raise a ticket with Copilot just so they can run through their checklist of common things.
HMA59
24 Posts
0
April 3rd, 2014 08:00
"You turn off the write cache to the volume (that uses the recommended profile) so that writes would not be cached to the SSD."
--> I turn off the write cache to the destination volume (that uses the recommended profile) so that writes would not be passed to the NVRAM/Cache controller and would not be replicated to the second controller, but would be writen directly to SSD.
--> Not tried with LINUX.
"Have you tried a copy from local storage to LUN? Does that make a difference?"
--> Yes,this is it : I copy the big file a Compellent LUN and paste the file to another Compellent LUN.
"Have you tried using something like FastCopy to do your copy test? Does that make a difference?"
--> Not tried with FastCopy. But, as mentioned, a Full SQL Restore though a Symantec Backup Exex SQLServer Agent reaches to same conclusion : the restore rates is 40% slower without cache.
"Are you seeing any errors on the FC switches? I would guess not if the IOmeter comes out fine."
--> No errors on FC switches.
Yes, I will put the copilot support in charge of this problem.
Thanks for your help.
JBrunt
8 Posts
0
June 11th, 2014 11:00
Did you ever get an explanation for this problem from co-pilot? Could you post it here as we are beginning to deploy more and more flash, and I would be interested to see the explanation.
HMA59
24 Posts
0
June 13th, 2014 00:00
No sorry, and unfortunately, I didn't had any time left to open a case on this subject. But this is still relevant today. And as a workaround, I keep write cache active on all my Windows Servers (not on my ESXi servers that doesn't seem to be concerned).
I will open a case on Copilot and let you know the result. But I don't know when...
carsten anker
1 Message
0
July 12th, 2015 12:00
Hi,
did you find out? we see the same as you descripe. with ISCSI.
another thing we see, is that 2008r2 is faster than 2012 r2.
we have also tried everything around nic tuning.
Thanks
carsten