Data Domain: DDFS experiences a segfault via IDmapper when using NFSv4 (especially if NIS/LDAP is not configured)
Summary: The FS may experience a segfault when using NFSv4 with some DDOS 6.1.x releases, particularly so when there is a timeout or hand with NIS or LDAP.
Symptoms
DDOS 6.1.x includes support for NFSv4. Due to an internal code issue in earlier releases of DDOS 6.1.x, the FS process may segfault ( theprocess tries to reach memory it has not allocated, and the kernel kills it via Signal 11, or SIGSEGV), dump core and restart, and depending on configuration and workload, this can happen from time to time or even back to back, causing significant customer downtime.
The segfault occurs when the NFSv4 IDmapper fails upon trying to do a reverse lookup (UID -> username). This could happen either if no mapping exists for the UID, or there was a timeout/hang in the mapper (NIS/LDAP issue)
Although 100% identifying this issue as the culprit may need a full SUB and at least one FS process core file, the kern.info log file may contain sufficient proof of it:
Apr 30 15:33:18 localhost kernel: (E6)[ 365116.728458] dd_guts_0[18807]: segfault at deadbef7 ip 0000000000fb2xxx sp 00007f42e6cdxxxx error 6 in ddfs[400000+3584000] Apr 30 15:33:18 localhost kernel: (E6)[ 365116.728549] Signal 11 posted to dd_guts_0(pid=18807) by dd_guts_0(pid=18807) Apr 30 15:39:39 localhost kernel: (E6)[ 365497.863252] dd_guts_0[20930]: segfault at deadbef7 ip 0000000000fbxxxx sp 00007fed554fxxxx error 6 in ddfs[400000+3584000] Apr 30 15:39:39 localhost kernel: (E6)[ 365497.863337] Signal 11 posted to dd_guts_0(pid=20930) by dd_guts_0(pid=20930) Apr 30 15:46:03 localhost kernel: (E6)[ 365879.939951] dd_guts_1[24054]: segfault at deadbef7 ip 0000000000fbxxxx sp 00007fc8ca3fxxxx error 6 in ddfs[400000+3584000] Apr 30 15:46:04 localhost kernel: (E6)[ 365879.940082] Signal 11 posted to dd_guts_1(pid=24054) by dd_guts_1(pid=24054) Apr 30 15:52:40 localhost kernel: (E6)[ 366273.499171] dd_guts_1[25214]: segfault at deadbef7 ip 0000000000fbxxxx sp 00007f78988exxxx error 6 in ddfs[400000+3584000] Apr 30 15:52:40 localhost kernel: (E6)[ 366273.499299] Signal 11 posted to dd_guts_1(pid=25214) by dd_guts_1(pid=25214)
Cause
Resolution
Customers experiencing this issue and who cannot immediately upgrade to a fixed release will have to apply a workaround, which consists of changing the default setting for NFSv4 ID mapping from "map-first" to "always" (forcing the IDmapper to use numeric UIDs and hence avoid the reverse lookups).
If the FS process is up and running, just run the command below to change the setting:
# nfs option set nfs4-idmap-out-numeric always NFS option 'nfs4-idmap-out-numeric' set to 'always'.
If on the other hand the FS process is down due to repeated crashes, the command will not work, and the alternate (but otherwise equivalent) process below will be necessary:
- Log in to the DD as "sysadmin" or similar user and go SE mode
-
NOTE: SE commands have been deprecated in DDOS versions 7.7.5.25, 7.10.1.15, 7.13.0.15, 6.2.1.110 and above and are accessible only by Dell employees
-
- Run the command below to change the setting:
# reg set protocol.nfs.option.nfs4_idmap_out_numeric = always
-
Finally, bring the FS process up by running "filesys enable" from the CLI
# nfs option reset nfs4-idmap-out-numeric NFS option 'nfs4-idmap-out-numeric' reset to default (map-first)
If in doubt, collect a SUB, open a new SR with DD Support and reference this KB number in the problem description for help.