Dell Unity: CAVA timeout can cause unexpected SP reboots (User Correctable)
Summary: Under certain conditions in all Unity OE code versions below 5.1.x, a CAVA request timeout can exceed the timeout of another operation, causing that operation to fail and the SP to panic. ...
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
SP panic with a error code like 12dc501 for internal system service (pool-SYSTEM_POOL) offline.
In some cases, this can occur on both SPs and cause DU.
In some cases, this can occur on both SPs and cause DU.
Cause
The root cause of the CAVA timeouts was several CAVA servers being listed in the CAVA configuration which were no longer present. CAVA was waiting for heartbeat responses from those servers. The default request timeout for CAVA in all code versions before 5.1.x was 300 seconds.
In one customer case, a filesystem checkpoint unmount thread was waiting for the CAVA request thread to complete, but the CAVA timeout was set at 300 seconds and the checkpoint unmount timeout was set at 210 seconds. The unmount failed to complete in time, which caused the SP panic.
In one customer case, a filesystem checkpoint unmount thread was waiting for the CAVA request thread to complete, but the CAVA timeout was set at 300 seconds and the checkpoint unmount timeout was set at 210 seconds. The unmount failed to complete in time, which caused the SP panic.
Resolution
- Be mindful of what IP addresses are listed in your CAVA configuration. Ensure that only IP addresses to active CAVA servers are listed to avoid any unnecessary communication timeouts.
- Dell Unity OE version 5.1.x changes the default CAVA request timeout from 300 seconds to 120 seconds. Upgrading to 5.1.x will prevent this issue from occurring.
- Short of upgrading, customers can manually change the parameter which controls the CAVA request timeout. Log on to the SSH console and enter the following command:
svc_nas ALL -param -f viruschk -m requestTimeout -v 120000
This change takes place immediately. This change is also persistent across SP reboots. However, the default timeout will be applied to any new NAS servers created, so this command should be run every time you create a new NAS server until the code is upgraded to 5.1.x.
Affected Products
Dell EMC UnityArticle Properties
Article Number: 000192245
Article Type: Solution
Last Modified: 29 Aug 2022
Version: 2
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.