Would small (i.e. 4K) wsize NFS3 client options make the coalescer work harder? I'm assuming adding SSDs to the nodes will allow the NVRAM cache to be flushed faster and thus better buffer random NFS writes?
This is an NFS environment I have inherited, and I'm in the process of reverse engineering how it was (mis)constructed. I've determined there's some Citrix virtual disks mounted off the Isilon cluster that are generating as much if not more LIN locks as the MySQL DB is.
Peter_Sero
4 Operator
•
1.2K Posts
0
October 8th, 2013 07:00
Hello Gary,
it seems the writes are smaller than 4K already, and I wouldn't restrict the wsize
to a value lower than what the client likes to send in one op.
(Divide the In rate by the Ops rate in your example, or use the --long option
to see InAvg/InMin/InMax, i.e. the distribution of actual write sizes in B or KB).
Unless you put the real DB data files on SSD, the flash will (only) help
in navigating to the data blocks in the file. As you do many updates
to existing block, the accessed file layout information might be held largely in
the RAM cache anyway, and probably you wouldn't see much improvement.
But it could help in principle, would never harm in any case.
Snapshots can hurt with random updates, as copy-on-write
might be chosen here. If you have Snaphots, delete them, or
run test on fresh copies of the DB files which are not covered by Snapshots.
Forget to mention that the coalescer in 7.0 had been improved for latency in 2012.
But wether it can do wonders where there is nothing to coalesce in the end
due to heavy scattering? It will most likely behave differently, and probably not worse than
the 6.5 coalescer for your case. In 7.1 more changes have been made, as Jim just
explained in the context of many-small-files within the ongoing Ask The Expert discussion on 7.1.
Cheers
-- Peter
Peter_Sero
4 Operator
•
1.2K Posts
2
October 8th, 2013 01:00
The locks shown by "isi statistics heat" are One-FS internal locks when a node updates a file,
rather than application/protocol lock operations. You'll see plenty of them
with random IO and small writes blocks.
It's difficult to get the full picture from the statistics excerpts,
but I would guess the NFS client is simply filling the node's NVRAM,
while the OneFS write coalescer hopes it can do good work
in the end (i.e. to coalesce many small writes into fewer and larger
physical writes). However when it is full and it is time to
write the data to the disks, and the write chunks are too fragmented
over all, and a large number of small disk transfers need to made at once.
At this time the NVRAM cannot sustain a high rate of new writes.
So the intermittent phases of slow writes would correlate
with filling up and flushing the NVRAM.
In mixed loads with most of the clients doing streaming writes,
this effect doesn't become so prominent. The random IO clients
would just see so-so performance all the time.
A simple test can be to add some streaming write load
to the node from another client...
You can also try to observe, in 2-second intervals, and all simultaneously:
- isi statistics client (for exactly the MySQL connection)
- isi statistics drive --long (check out sorting by OpsIn or TimeInQ or Queued)
- sysctl efs.bam.coalescer_stats (many things going on here; you will see patterns in time for sure)
A couple of further thoughts:
- Try switching off the coalescer, either disable SmartCaching in SmartPools
or check wether DIRECTIO can be used by MySQL
- The access pattern for the MySQL file should be set to RANDOM
- Different database/table modules available fro MySQL
can show different write patterns, and hence, different coalescing behavior in OneFS
- Same will be true for the acclaimed drop-in MySQL replacement MariaDB
with its further options for tables and (application-side) caching.
Let us know what you find, good luck!
-- Peter
flyingkiwiguy
2 Posts
0
October 8th, 2013 03:00
Thanks Peter for the fast and informative reply.
Would small (i.e. 4K) wsize NFS3 client options make the coalescer work harder? I'm assuming adding SSDs to the nodes will allow the NVRAM cache to be flushed faster and thus better buffer random NFS writes?
This is an NFS environment I have inherited, and I'm in the process of reverse engineering how it was (mis)constructed. I've determined there's some Citrix virtual disks mounted off the Isilon cluster that are generating as much if not more LIN locks as the MySQL DB is.
Regards,
Gary