This article summaries 2012 Chinese ATE activity: "NAS System Performance Optimization". The original thread is https://community.emc.com/thread/146428.
What factors affect NAS performance?
In general, there are five basic factors that can influence NAS performance.
Could you please introduce the steps for troubleshooting NAS performance issue?
NAS performance is a complicated issue. First, you should understand the issue before troubleshooting. Find out which file systems are being affected, is it one or many file systems? Are they all on the same DM, RG, LUN, etc.? Use the performance checklist as a good source of the questions to be asked.
After that, you may collect the relevant logs, such as support materials, SPCOLLECT, NAR files, Network package, tcpdumpsand so on.
Finally, you should analyze the log to look for the bottlenecks. This requires a strong basic skill, such as understanding CIFS, NFS, TCP and other protocols. You can find the solution after you have completed the analysis. For example: finding the root cause of a network congestion, may require user to upgrade their network equipment, or to force the sender to decrease the TCP Send Window.
What kind of tools can monitor NAS performance?
Selecting the proper management tool is crucial to NAS monitoring since it can facilitate all the management tasks, reduce NAS device downtime, and further make the NAS device easy to use, maintain, and expand. Here are some NAS monitoring tools.
Back-end storage: collecting NAR file and analyzing the NAR file.
Network package: Wireshark tool.
NAS performance case sharing (1)
This case is from an issue in a Network TV company. The NAS write performance was very good, but the read performance was poor.
I captured the network packets using Wireshark in the Windows client, and found that the NAS read performance was good and the network traffic was not congested. But the network packets often encountered retransmission due to a disorder when the client received the network packets from network layer to the TCP layer. Packet retransmission may cause a big influence on NAS performance; even a 0.1% retransmission may seriously affect the performance.
On the test environment, the customers chose to reinstall the Windows client. The network retransmission disappeared after reinstallation. Therefore the NAS performance has greatly improved.
NAS performance case sharing (2)
The NAS performance with a single client was good, but the performance was rapidly declining when multiple clients simultaneously read and wrote to the NAS device.
I captured the network packets using Wireshark in the Windows client, and noticed that the client status and network status were good. But the NAS read performance was rapidly declining when the client number is increasing. And then I found the file system was only built on a single LUN, after analyzing the log, which caused the poor performance.
Build a LUN on different RAID groups to make a strip volume then build a File System on the created strip volume.
NAS performance case sharing (3)
There were two file systems in the NAS, but the performance of file system laserCT500test was better than the performance of file system LaserCT500.
File system LaserCT500:
LaserCT500 built on four LUNs with RAID10. Each of the two LUNs located in two different RAID groups, all LUN are belong to SPA.
The user hand-picked four LUNs to build a file system instead of using the AVM tool.
File system LaserCT500test:
LaserCT500test volume and file systems was created with the AVM tool, the AVM picked four LUN with different RAID 5 (4 +1) for LaserCT500test, so the file system is located in the 5 * 4 = 20 disks.
The conclusion is presented from the analysis results, the file system LaserCT500 created volumes and file systems on an irrational volume structure. The utilization ratio of four disks is 100% and a single disk I/O exceeds 320/s. Generally speaking, the single disk I/O of 15K RPM FC disk should not exceed 180/s. I strongly recommend that the user create volumes and file systems on a rational volume structure.
NAS performance case sharing (4)
The VNX NFS file system used VMware Datastoresin a cloud computing company. It previously deployed VM via a VM Template in 10 minutes, but in recent deployments the time was suddenly increased to 20 minutes.
File system layout performance:
The NFS file system was built on 8 different LUNs with RAID 1/0 and each RAID group has two SAS disks. The file system layout looks good.
Data Mover Performance:
DM CPU and memory were quite free. Some NFSv3 read/write response time was very long, sometimes even requires several seconds. When you try to deploy a VM, NFSv3 Write I/O was about 5000 ~ 8000 IOPS, and NFSv3 Read I/O was 9000~15,000 IOPS. Where did all those read I/Os come from? I found that the user's VM Template also comes from an NFS Datastore. So I deleted a relatively large VMDK, and then read I/O was reduced to 5000 IOPS.
No packet loss and retransmission rate was very low.
VNX back-end storage performance:
The data of NAR Files showed that the NFS file system is built on 8 LUNs and the response time is very fast. However, we found that the dirty cache in the SPA has reached to 100% and the status of SPB is healthy. The SPA has to deal with double I/O number of compared to SPB. Also:
• The LUNs had been evenly distributed to SPA and SPB, but the I/O of LUNs which run on the SPA are more than SPB’s.
• The major operations on the SPA is to write.
• The most hosts with NL-SAS disks are overloaded.
In conclusion, the issue was caused by the overloaded disks on backend storage. It caused the write cache refresh to be slow in the SPA which in turn causes the NFS write operation to be slow.
Solution A: The LUNs which has been used for theNFS file system on the SPA should be transferred to the SPB.
Solution B: (1). Analyze the workload of the LUN, and then average and distribute the workload for SPA and SPB.
(2). Add more disks to the storage, and then migrate the busy LUN to the new LUN to improve the NAS performance.
(3). Add additional disks to the Pool and evenly distribute data to all disks, the performance can be improved.
(4). Replace RAID 6 with RAID 5.
(5). Install NFS VAAI.
Author: Jeffey Liu
Please click here for for all contents shared by us.