One of our DMX array is experiencing write pending issue. Every time it reaching 90+%. After analysing we identified that few devices which reaching their maximum device WP limit which in turn resulting high overall system WP.
The device WP are increasing at a particular time for each device.This devices are presented to SQL server and SQL backup(from one drive to other within server) are starting at that time.
There are multiple SQL servers and all will perform the SQL backup that increasing the system WP.I am looking for a solution for this issue.
One solution I identified is to re schedule the backup timing. Currently backup starting on same time in most the servers.
Is there any other suggestion to reduce the WP issue ?
WP issues are pretty simple. Either slow the writes into the system, or increase the speed of the destage. Adding cache generally does not help much.
You can limit the writes into the system by reducing the paths into the system, or the queue depth.
You can increase the destage rate by increasing the number of disks or DAs and/or the raid protection.
Can you explain littile about below point ?
"You can increase the destage rate by increasing the number of disks or DAs and/or the raid protection"
Write cache is simply a buffer between the host and the disks. If the disks can't keep up with the host write rate, the buffer will fill and the host will be slowed to the rate of the destage on the disks.
If the disks can destage faster than the host is writing, then the host will always be able to write to cache with no delay because the buffer is full. Fewer drives can destage slower than more drives. RAID1 can desage faster than RAID5 which can desage faster than RAID6. Put enough drives behind a DA CPU, and the DA CPU can be the limiting factor for the destage rate, so in that case, more DAs can increase the destage rate.
Need more explanation?
Currently we are using RAID 5 drive for our backup drive(to which we will dump the SQL backup).So If we configure it with a RAID 1 we can increase the destage rate right which inturn reduce WP. right ?
RAID1 would probably help, but if the workload is 100% sequential on the disks, RAID5 is more efficient. This is because of optimized writes which can be performed when the whole raid stripe is in cache. Then we don't need to perform any reads to calculate parity.
The other option is to spread the workload across more drives. I don't think you said if this was VP or traditional thick. If thick, meta volumes can help spread the load over multiple raid groups.
This is a Thick one and I am using a meta volume.
I am working with my SQL team to re arrange the backup timings to reduce the writes to storage in a given time
With thick you need to make sure your meta is the same width as the disks you are writing to. For example if you had 16 total drives with 3+1 protection, your meta volume should have 4 members so that you have one member on each raid group. If it wraps around, it will increase the seek time. If it doesn't touch all the drives, you won't get the advantage of all the disks.
That's the main reason I refuse to let my DBA team use the vMax as a backup device. Another solution, and in my opinion a better solution, is to purchase a small disk array or a tray of disk for this purpose.
On a side note, one would think a storage company would add DBA and vMax to the site dictionary.