Unsolved
This post is more than 5 years old
6 Posts
0
3284
October 24th, 2013 02:00
Vmax write response times
Hi
I have a question on VMAX write response times when see that when we run a specific job that the write times on all the FA's increase to around 30ms or over this is measured using the "average write time (ms)". The read time on the FA is smaller at around 4ms on average.
The FA % busy figure is showing as less than 35% for all FA and we are not sharing any ports with any other hosts on these FA's. We do see a increase in slot collisions to around 30 which can be caused by read misses then these drop off however I don't see any huge numbers in the amount of write misses either by looking at the devices.
I am writing around 650 "write reqs per sec" with a write size totaling 7500kb , the average request size in KB is peaking between 8 and 34 during the issue seen. ( oracle database )
What I have noticed is that I see we have data in the "average queue depth range 8 bucket" which ties in with the times of the run and I also see an increase within the "average write response time range 7" of around 85 which remains constant.
Is there anyway to mitigate this queuing at the FA even though we are not push the FA % busy > 40%. I am also still looking at the host to determine if the host is bursting on I/O ( Hard as its linux)
Please see attached , the main job starts at 14:45
Cheers
Al



Quincy561
1.3K Posts
0
October 24th, 2013 06:00
How many devices are active? Are you using striped meta volumes?
More devices can help in these cases.
A trace maybe the only way to know what is really going on.
almc77
6 Posts
0
October 24th, 2013 08:00
Hi
There are around 11 8-way metas (striped) devices per filesystem, these are setup as a concat at the OS. We had a thought that we were targeting just specific tracks and possibly individual meta devices on the updates and were possibly queuing at that level / bursting on I/O, however we are now in the process of changing the i/o scheduler as there is a mix between the veritas layer (deadline) and the lower scsi layer(cfq) which has shown in a test environment to drop the write times down.
Will update soon once we know more.
Thanks
Al
Quincy561
1.3K Posts
0
October 24th, 2013 08:00
Another option is to ditch the metas and go with just host striping. One advantage is you can make the stripe size smaller.
In your first message you implied that you might be seeing very large IOs (7MB). Large writes might be a lot better with a host stripe at 64K.
almc77
6 Posts
0
October 31st, 2013 09:00
Hi
Just an update, we decreased the nr_requests for all luns from 128(default) to 8 on the linux nodes as the I/O was showing to be bursting a large amount of requests in three second intervals, we also changed the schedulers to deadline for all devices. This has dropped the average write time to around 4ms from over 40ms which is far better and the queue buckets on the FAs are now in the bucket 4,3,2 range.
Thanks
Al