I would suggest you to open a support ticket with EMC, the remote support team would do the host side config and other checks and will provide recommendations. If those don’t solve the issue then performance team would need to be engaged after collecting the required data from the Symm.
Saw this similar issue with similar error recently on tempdb with SRDF/S , when the SRDF/S for this tempdb device was halted the issue stopped occurring.
Again I would suggest to have the performance team involved.
SQL is very good a doing large bursts of writes. They can easily queue up on the FA processors and elongate response times. How many FA CPU's are you using?
SRDF/S
The number in-flight writes is limited in SRDF/S mode to (up to ) one per symdev per FA port. So a 4-way meta presented to 4 Fa's can have a maximum of 16 in-flight writes. Are you using Powerpath? You want your writes over all the FA's.
TEMPDB
The data in TempDB is not required for system recovery. If it is the TEMPDB lun with the response time issues, set it to adaptive copy instead. You still get the structure of the lun at the R2 end with adaptive copy, so Disaster restart is not compromised.
How many writes per second in the problem period?
Run Perfmon in one-second interval during the problem period to asses the size of the write bursts. Use this information along with the SRDF/S rules to determine if your meta needs more members or you need more FA ports.
Have you asked your DBA to check if any job running at that time? There might be some abnormal space allocation(New Writes) by temp table during reindex or big join opeartion etc which caused peak disk response time. The response time increasing might caused by your SRDF/S. However, why your tempdb setup the SRDF/S? SQL Server TEMPDB normally don't need any backup or restore operation.
In additional, RAID 6 is not a proper raid protection for write sensitive application like SQL Server TEMPDB.
I agree Saurabh's advice, you can try to swithc SRDF to ACP Mode to see if it's SRDF/S issue
arjunsingh1
32 Posts
1
August 7th, 2013 07:00
I would suggest you to open a support ticket with EMC, the remote support team would do the host side config and other checks and will provide recommendations. If those don’t solve the issue then performance team would need to be engaged after collecting the required data from the Symm.
Saw this similar issue with similar error recently on tempdb with SRDF/S , when the SRDF/S for this tempdb device was halted the issue stopped occurring.
Again I would suggest to have the performance team involved.
- Arjun
sauravrohilla
859 Posts
0
August 7th, 2013 20:00
Since you have mentioned that the devices are bound to raid 6 so I assume its the SATA tier? I would start by looking at these two things first.
1) write response time of that particular device (not for the entire SG).
2) Also, is it possible to test with SRDF being suspended or in acp mode?
regards,
Saurabh
PedalHarder
3 Apprentice
•
465 Posts
0
August 7th, 2013 21:00
SQL is very good a doing large bursts of writes. They can easily queue up on the FA processors and elongate response times. How many FA CPU's are you using?
SRDF/S
The number in-flight writes is limited in SRDF/S mode to (up to ) one per symdev per FA port. So a 4-way meta presented to 4 Fa's can have a maximum of 16 in-flight writes. Are you using Powerpath? You want your writes over all the FA's.
TEMPDB
The data in TempDB is not required for system recovery. If it is the TEMPDB lun with the response time issues, set it to adaptive copy instead. You still get the structure of the lun at the R2 end with adaptive copy, so Disaster restart is not compromised.
How many writes per second in the problem period?
Run Perfmon in one-second interval during the problem period to asses the size of the write bursts. Use this information along with the SRDF/S rules to determine if your meta needs more members or you need more FA ports.
Fenglin1
4 Operator
•
2.1K Posts
0
August 7th, 2013 23:00
Have you asked your DBA to check if any job running at that time? There might be some abnormal space allocation(New Writes) by temp table during reindex or big join opeartion etc which caused peak disk response time. The response time increasing might caused by your SRDF/S. However, why your tempdb setup the SRDF/S? SQL Server TEMPDB normally don't need any backup or restore operation.
In additional, RAID 6 is not a proper raid protection for write sensitive application like SQL Server TEMPDB.
I agree Saurabh's advice, you can try to swithc SRDF to ACP Mode to see if it's SRDF/S issue
mumshad
16 Posts
0
August 21st, 2013 04:00
These are not SATA. They are FC disks. I have mentioned that in the question.
Quincy561
1.3K Posts
0
August 21st, 2013 05:00
Why do you have RAID6 with synchronous RDF? Seems like overkill. RAID1 would reduce the load on the FC disks significantly.