OneFS: SMB Connection and Protocol Audit Stops Working After SMB Service Disabled or Enabled
Summary: Server Message Block (SMB) connection and protocol audit may stop working after SMB service is disabled or enabled.
Symptoms
Protocol audit is enabled on the cluster:
# isi audit settings global view |grep "Protocol Auditing"
Protocol Auditing Enabled: Yes <<<<<<<<<<<
SMB service is disabled and re-enabled with the command:
# isi services -a smb disable The service 'smb' has been disabled. # isi services -a smb enable The service 'smb' has been enabled.
Nodes show high closed connections on TCP port 445:
For example:
# echo ">>> Any buildup of closed sockets against SMB? <<<"; isi_for_array -X 'netstat -an | grep "\.445" | grep CLOSED | wc -l' PowerScale-1: 7668 PowerScale-2: 7022 PowerScale-3: 7773 PowerScale-4: 7378
Checked the audit log and it shows that no new SMB audit events after the SMB service is disabled or enabled:
For example:
#isi_audit_viewer -t protocol -s "2025-01-15 11:30:00" | tail (SMB services is disabled/enabled around 11:30)
...
...
[88: Wed Jan 15 11:32:29 2025] {"id":"6bb81e75-a932-11ef-8b5b-0050569b863c","timestamp":1732321949246224,"payloadType":"bbce6a72-a92d-4330-a1f3-e9fd5aed8152","payload":"Shutting down audit driver: flt_audit"}
[89: Wed Jan 15 11:32:29 2025] {"id":"6bb8a404-a932-11ef-8b5b-0050569b863c","timestamp":1732321949249642,"payloadType":"7afb8d54-0aa7-4ed4-9691-341313ee37e3","payload":"Audit Driver: flt_audit Loaded"}
done
No socket to audit service in lwio process:
# procstat -f $(pgrep lwio)|grep -i "audit_service.sock" #
Cause
This is a product issue in OneFS 9.7.1.x and OneFS 9.8.
After the SMB service is disabled or enabled, the socket to audit the service is not properly restored in lwio process. It causes the SMB audit events not to be pushed to the audit service. Eventually the audit queue inside lwio is full. lwio is stuck waiting until the SMB operations can be audited.
Resolution
The code issue is fixed in OneFS 9.7.1.8, 9.10.1.0, 9.11 and later OneFS.
If the cluster cannot be upgraded to code level with the fix. Please follow the workaround, restart lwio on the impacted node to restore the socket to audit service.
- Verify the running
lwioPID
# ps auxw|grep 'lw-container lwio' root 83816 0.0 1.4 123100 56184 - I< 7Jan25 0:06.95 lw-container lwio (lwio)
- Restart
lwio
# killall lwio #
- Confirm the
lwioPID changes
# ps auxw|grep 'lw-container lwio' root 62370 0.0 0.9 84240 36200 - S< 04:14 0:00.19 lw-container lwio (lwio)
- Confirm the socket to audit service is back
# procstat -f $(pgrep lwio)|grep -i audit 62370 lwio 21 s - rw------ 2 0 UDS 0 0 /var/run/audit_service.sock