VNX: Clients Disconnected from CIFS server during internal checkpoint refresh
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
Large Directories
Nasdirtool confirms impacted production file systems contains multiple directories with over 500,00 files in a single directory
From nasdirtool output:
.....
/root_vdm_5/Applications/Appstorage/Images,95616,1458761 <=== 95MB in size and 1.4 million files
/root_vdm_6/Production/SubDirectory2/REP,150731,2104554 <=== 150MB in size and 2.1 million files
Some CIFS Clients are disconnected from the VNX CIFS Server during the update of the internal checkpoints used for replication on the source side array.
Other CIFS Clients and NFS Clients on other shares are operating normally.
High CPU utilization on the data mover can be seen frequently, depending on how large the directories contents are the data mover CPU utilization may reach 100%.
[nasadmin@VNX-CS0 tmp]$ server_stats server_2 -i 60
server_2 CPU Network Network dVol dVol
Timestamp Util In Out Read Write
% KiB/s KiB/s KiB/s KiB/s
10:41:25 99 16123 62578 61912 28048
10:42:25 98 4242 63170 62433 9793
10:43:25 99 2935 46987 48618 8918
10:44:25 99 7499 45901 46373 13019
10:45:25 99 4564 47836 48018 9625
10:46:25 98 3973 52316 52167 9035
10:47:25 98 9777 60167 55127 16238
10:48:25 97 18513 76583 70269 26258
10:49:25 98 11885 43789 43595 17238
10:50:25 99 17868 55491 52966 21029
10:51:25 99 8171 43491 43013 11961
10:52:25 99 8835 50947 50328 13369
A network capture taken during the incident showed TCP communications from client to server were working ok but the CIFS Server did not respond to the specific client experiencing the issue at the SMB Protocol Level resulting in a client timeout.
Cause
The source side File system in use for replication contains directories that exceed 500,000 files in a single directory. As documented in the EMC VNX OE for File Release notes, exceeding 500,000 files in a single directory will result in performance issues.
From the data mover log the following events are logged during the issue:
2016-08-12 12:58:40: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB415 aborted (client WINCLIENT01 disconnected)
2016-08-12 12:58:49: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB034 aborted (client WINCLIENT02 disconnected)
2016-08-12 13:09:29: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB356 aborted (client WINCLIENT03 disconnected)
2016-08-12 13:09:29: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB358 aborted (client WINCLIENT04 disconnected)
The Data mover log shows that the issue corresponds to an internal replication checkpoint refresh
Example of normal quick FS pause for checkpoint refresh on this source side array
2016-08-19 12:33:39: 26042826752: SVFS: 6: pause() requested on fsid:1103
2016-08-19 12:33:39: 26042826752: SVFS: 6: pause done on fsid:1103
In this case some operation is delaying the pause
2016-08-19 12:42:36: 26042826752: SVFS: 6: pause() requested on fsid:1103
...
2016-08-19 12:45:17: 26041909248: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB396 aborted (client WINCLIENT01 disconnected)
2016-08-19 12:45:26: 26041909248: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB478 aborted (client WINCLIENT02 disconnected)
...
2016-08-19 13:00:47: 26041909248: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB298 aborted (client WINCLIENT03 disconnected)
2016-08-19 13:00:52: 26042826752: SVFS: 6: pause done on fsid:1103
The Source side Internal Checkpoint refresh Pause above above shows non-normal behavour. A force panic was done to confirm what was causing the pause to take so much time and the analysis of the panic dump file confirmed the file system contains directories with millions of files in a single directory.
From the data mover log the following events are logged during the issue:
2016-08-12 12:58:40: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB415 aborted (client WINCLIENT01 disconnected)
2016-08-12 12:58:49: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB034 aborted (client WINCLIENT02 disconnected)
2016-08-12 13:09:29: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB356 aborted (client WINCLIENT03 disconnected)
2016-08-12 13:09:29: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB358 aborted (client WINCLIENT04 disconnected)
The Data mover log shows that the issue corresponds to an internal replication checkpoint refresh
Example of normal quick FS pause for checkpoint refresh on this source side array
2016-08-19 12:33:39: 26042826752: SVFS: 6: pause() requested on fsid:1103
2016-08-19 12:33:39: 26042826752: SVFS: 6: pause done on fsid:1103
In this case some operation is delaying the pause
2016-08-19 12:42:36: 26042826752: SVFS: 6: pause() requested on fsid:1103
...
2016-08-19 12:45:17: 26041909248: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB396 aborted (client WINCLIENT01 disconnected)
2016-08-19 12:45:26: 26041909248: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB478 aborted (client WINCLIENT02 disconnected)
...
2016-08-19 13:00:47: 26041909248: SMB: 6:[VDM2] Quota:getFsAndLock for Thread 1SMB298 aborted (client WINCLIENT03 disconnected)
2016-08-19 13:00:52: 26042826752: SVFS: 6: pause done on fsid:1103
The Source side Internal Checkpoint refresh Pause above above shows non-normal behavour. A force panic was done to confirm what was causing the pause to take so much time and the analysis of the panic dump file confirmed the file system contains directories with millions of files in a single directory.
Resolution
A new subdirectory structure should be put in place on the production file system. The files in the problematic directories must be distributed the across the new directories so as not to exceed 500,00 files in a single directory. The original problematic directories should then be deleted by the VNX Administrator.
Additional Information
EMC VNX Operating Environment for File Version 7.1.79.8 Release Notes
| Guideline/Specification | Maximum tested value | comment |
| Number of files per directory | 500,000 | Exceeding this number will cause performance problems. |
Affected Products
VNX1 SeriesProducts
VNX1 Series, VNX2 SeriesArticle Properties
Article Number: 000052074
Article Type: Solution
Last Modified: 06 Nov 2025
Version: 3
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.