This post is more than 5 years old
2 Posts
0
7265
October 7th, 2013 14:00
Periodic slow NFS client write performance
Hi all,
I've noticed random intermittent but frequent slow write performance on a NFS V3 TCP client, as measured over a 10 second nfs-iostat interval sample:
write: ops/s kB/s kB/op retrans avg RTT (ms) avg exe (ms)
97.500 291.442 2.989 0 (0.0%) 457.206 515.703
121.500 470.055 3.869 0 (0.0%) 265.747 268.599
124.800 470.765 3.772 0 (0.0%) 154.299 155.938
Its taking sometimes 30+ seconds to for the NFS writes to complete, as reported by the Linux client. As you can see, its not writing a lot of data, but there is a lot of flushing to NFS as it is a MySQL DB. Here's my mount options:
rw,relatime,vers=3,rsize=4096,wsize=4096,namlen=255,hard,nolock,proto=tcp,retrans=3,sec=sys,mountvers=1,mountproto=udp,local_lock=all
I know wsize and rsize should be larger, but I am getting similar but less frequent behaviour from other clients that use the Isilon much larger server defaults of rsize=131072,wsize=524288.
On looking at isi statistics I see the following:
# isi version
Isilon OneFS v6.5.4.4 B_6_5_4_76(RELEASE): 0x60504000040004C:Tue Oct 11 18:21:38 PDT 2011 root@fastbuild-05.west.isilon.com:/build/mnt/obj.RELEASE/build/mnt/src/sys/IQ.amd64.release
# isi status
Cluster Name: XXXXX
Cluster Health: [ OK ]
Cluster Storage: HDD SSD
Size: 39T (43T Raw) 0
VHS Size: 4.3T
Used: 13T (33%) 0 (n/a)
Avail: 26T (67%) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR| In Out Total| Used / Size |Used / Size
---+---------------+----+-----+-----+-----+------------------+-----------------
1|XX.XX.XX.XX | OK | 328K| 0| 328K| 3.2T/ 9.7T( 33%)| (No SSDs)
2|XX.XX.XX.XX | OK | 0| 1.6M| 1.6M| 3.2T/ 9.7T( 33%)| (No SSDs)
3|XX.XX.XX.XX | OK | 478K| 577K| 1.1M| 3.2T/ 9.7T( 33%)| (No SSDs)
4|XX.XX.XX.XX | OK | 478K| 24K| 502K| 3.3T/ 9.7T( 34%)| (No SSDs)
------------------------+-----+-----+-----+------------------+-----------------
Cluster Totals: | 1.3M| 2.2M| 3.5M| 13T/ 39T( 33%)| (No SSDs)
It doesn't really seem that busy... however in a 10 second interval sample I see:
# isi statistics client
Ops In Out TimeAvg Node Proto Class UserName LocalName RemoteName N/s B/s B/s us
0.4 1.7K 54.4 3620423.0 4 nfs3 write UNKNOWN XX.XX.XX.XX host1
0.2 830.4 27.2 2515569.0 4 nfs3 write UNKNOWN XX.XX.XX.XX host1
0.6 40.8 72.0 221096.3 2 nfs3 delete root XX.XX.XX.XX host2
0.6 40.0 48.8 206670.0 2 nfs3 delete root XX.XX.XX.XX host3
0.2 13.6 24.0 101271.0 2 nfs3 delete root XX.XX.XX.XX host3
0.4 27.2 48.0 64634.0 2 nfs3 delete root XX.XX.XX.XX host4
0.4 27.2 48.0 64576.0 2 nfs3 delete root XX.XX.XX.XX host3
Interestingly, the DB host above is on node 3.
In addition, when I run "isi statistics heat --nodes=all" around the same time it looks like I have 10K+ LIN locks active.
So, clearly 3+ seconds for a NFS write is an issue. My question is if 10K LIN locks is considered high? And if so, what could be causing so many locks?
Regards and thanks for your time,
Gary


Peter_Sero
4 Operator
•
1.2K Posts
0
October 8th, 2013 07:00
Hello Gary,
it seems the writes are smaller than 4K already, and I wouldn't restrict the wsize
to a value lower than what the client likes to send in one op.
(Divide the In rate by the Ops rate in your example, or use the --long option
to see InAvg/InMin/InMax, i.e. the distribution of actual write sizes in B or KB).
Unless you put the real DB data files on SSD, the flash will (only) help
in navigating to the data blocks in the file. As you do many updates
to existing block, the accessed file layout information might be held largely in
the RAM cache anyway, and probably you wouldn't see much improvement.
But it could help in principle, would never harm in any case.
Snapshots can hurt with random updates, as copy-on-write
might be chosen here. If you have Snaphots, delete them, or
run test on fresh copies of the DB files which are not covered by Snapshots.
Forget to mention that the coalescer in 7.0 had been improved for latency in 2012.
But wether it can do wonders where there is nothing to coalesce in the end
due to heavy scattering? It will most likely behave differently, and probably not worse than
the 6.5 coalescer for your case. In 7.1 more changes have been made, as Jim just
explained in the context of many-small-files within the ongoing Ask The Expert discussion on 7.1.
Cheers
-- Peter
Peter_Sero
4 Operator
•
1.2K Posts
2
October 8th, 2013 01:00
The locks shown by "isi statistics heat" are One-FS internal locks when a node updates a file,
rather than application/protocol lock operations. You'll see plenty of them
with random IO and small writes blocks.
It's difficult to get the full picture from the statistics excerpts,
but I would guess the NFS client is simply filling the node's NVRAM,
while the OneFS write coalescer hopes it can do good work
in the end (i.e. to coalesce many small writes into fewer and larger
physical writes). However when it is full and it is time to
write the data to the disks, and the write chunks are too fragmented
over all, and a large number of small disk transfers need to made at once.
At this time the NVRAM cannot sustain a high rate of new writes.
So the intermittent phases of slow writes would correlate
with filling up and flushing the NVRAM.
In mixed loads with most of the clients doing streaming writes,
this effect doesn't become so prominent. The random IO clients
would just see so-so performance all the time.
A simple test can be to add some streaming write load
to the node from another client...
You can also try to observe, in 2-second intervals, and all simultaneously:
- isi statistics client (for exactly the MySQL connection)
- isi statistics drive --long (check out sorting by OpsIn or TimeInQ or Queued)
- sysctl efs.bam.coalescer_stats (many things going on here; you will see patterns in time for sure)
A couple of further thoughts:
- Try switching off the coalescer, either disable SmartCaching in SmartPools
or check wether DIRECTIO can be used by MySQL
- The access pattern for the MySQL file should be set to RANDOM
- Different database/table modules available fro MySQL
can show different write patterns, and hence, different coalescing behavior in OneFS
- Same will be true for the acclaimed drop-in MySQL replacement MariaDB
with its further options for tables and (application-side) caching.
Let us know what you find, good luck!
-- Peter
flyingkiwiguy
2 Posts
0
October 8th, 2013 03:00
Thanks Peter for the fast and informative reply.
Would small (i.e. 4K) wsize NFS3 client options make the coalescer work harder? I'm assuming adding SSDs to the nodes will allow the NVRAM cache to be flushed faster and thus better buffer random NFS writes?
This is an NFS environment I have inherited, and I'm in the process of reverse engineering how it was (mis)constructed. I've determined there's some Citrix virtual disks mounted off the Isilon cluster that are generating as much if not more LIN locks as the MySQL DB is.
Regards,
Gary