Connectrix Brocade B-Series: SCN-1001 SCN Queue Overflow for snmpd Process
Summary: State change notification (SCN): SCN-1001 SCN queue overflow for the process of snmpd messages results in an SNMPd termination
Symptoms
Simple Network Management Protocol (SNMP) queries for the following can stop responding due to contention with other snmpd threads:
- Bootup
- Firmware install date or time
- Boot Programmable Read-Only Memory (PROM) of installation date or time
FOS continues to retry these queries until SCN-1001 queue overflow alerts are logged due to the deadlock condition leading to snmpd termination.
SNMP terminates during swBootPromLastUpdated, swFlashLastUpdated, or swBootProminstallDate queries due to a stuck RPM call which is seen in the below output:
ps exfcl /fabos/cliexec/errdump -a: 2023/12/01-21:16:29, [SCN-1001], 86271, SLOT 1 | FFDC | CHASSIS, CRITICAL, Dell_Brcd_X6-4, SCN queue overflow for process snmpd. 2023/12/01-21:16:29, [RAS-1001], 86272, SLOT 1 | CHASSIS, INFO, Dell_Brcd_X6-4, First failure data capture (FFDC) event occurred. 2023/12/01-21:16:29, [SCN-1001], 86273, SLOT 1 | FFDC | CHASSIS, CRITICAL, Dell_Brcd_X6-4, SCN queue overflow for process snmpd. 2023/12/01-21:16:37, [LOG-1000], 86280, SLOT 1 | CHASSIS, INFO, Dell_Brcd_X6-4, Previous message repeated 7 time(s). 2023/12/01-21:16:37, [SCN-1001], 86281, SLOT 1 | FFDC | CHASSIS, CRITICAL, Dell_Brcd_X6-4, SCN queue overflow for process snmpd. 2023/12/01-21:16:38, [LOG-1000], 86282, SLOT 1 | CHASSIS, INFO, Dell_Brcd_X6-4, Previous message repeated 1 time(s). 2023/12/01-21:16:38, [KSWD-1002], 6908, FFDC | CHASSIS, WARNING, Dell_Brcd_X6-4, Detected termination of process snmpd:2648.
/fabos/cliexec/hadump: --------------------------------------- TIME_STAMP: Dec 1 22:43:03.131548 --------------------------------------- Local CP (Slot 2, CP1): Active, Warm Recovered Remote CP (Slot 1, CP0): Standby, Healthy HA enabled, Heartbeat Up, HA State not in sync
The output of the following command indicates a stuck RPM thread:
ps excfl /bin/ps exfcl: 0 0 29270 2413 20 0 0 0 exit Z ? 0:00 \_ snmpd <defunct> 0 0 23760 1 20 0 5144 3304 - R ? 5531:10 rpm. <<<<stuck RPM thread called by snmpd
Cause
This issue is identified due to FOS defect FOS-851141 under FOS release v9.1.1c.
The swBootDate value is retrieved with an "Application Programming Interface (API)" that uses file operation. Similarly, the swFlashLastUpdated and the swBootProminstallDate value are retrieved with another API that uses RPM queries. These I/O operations typically take time. The query is retried when processing the SNMP GET request for these parameters during a time when the SNMP agent is processing many requests simultaneously. These retries add overhead for the agent which creates a queue overfull condition that leads to snmpd termination.
Resolution
Workaround: Avoid issuing SNMP queries.
Resolution: Firmware was optimized in the v9.1.1c SNMP code to cache data like bootup date or time and firmware install date or time. An additional enhancement has been checked into v9.1.1d that also caches the boot PROM of installation date or time during SNMP activation. The cached data are used for these queries to prevent contention between threads within SNMPd.
Brocade DEFECT FOS-851141