Data Domain: FS process PANIC in the inode cache when running out of memory in cache element pool
Resumen: A defect has been found in some recent DDOS versions (confirmed in 7.7.4, 7.9.0.10 and 7.10.0, dubious if affecting DDOS 7.7.3 as well) by which an FS process PANIC may occur in the inode cache code when, depending on the workload, a cache element pool runs out of memory for further allocations. ...
Este artículo se aplica a
Este artículo no se aplica a
Este artículo no está vinculado a ningún producto específico.
No se identifican todas las versiones del producto en este artículo.
Síntomas
There is no degradation or advance warning for this issue, which will manifest itself in the form of an FS process failure (PANIC), after which, the process would restart and come up again fine automatically.
Due to the code path being exercised, the FS process may PANIC in several different ways, including the following:
Due to the code path being exercised, the FS process may PANIC in several different ways, including the following:
PANIC: ddr/sm/ddfs/ddfs_mtree.c: ddfs_mtree_list: 829: !((dd_errno(e) == ENOENT) || (dd_errno(e) == DD_ERR_FM_EATTRNOENT) || (dd_errno(e) == DD_ERR_STALE)) PANIC: ddr/fv/file_verify.c: file_verify_update_marker_attrs: 4872: Fatal Error PANIC: ddr/fv/file_verify.c: file_verify_update_snap_attr: 4446: Fatal Error PANIC: ddr/fv/file_verify.c: file_verify_update_marker_attrs: 4860: Fatal Error
In the FS process log files (ddfs.info) the following messages will be found prior to each process crash:
01/17 20:21:59.292947 [7fbbf4f98f50] dd_cache_elem_reclaim: Evict count=256, Visited count=257, Skipped elem count=0, Skipped bucket count=0, Time threshold=1539816333626910. (99% full) Complete=True 01/17 20:22:04.662303 [7fbb031ad4f0] ERROR: FM fm_iget:355 - fm_iget failed to allocate elem in dd_cache 5001
Messages indicating the internal process full was 99% full, then unable to allocate any further elements, hence leading to process crash.
NOTE: This issue is known to only affect the following versions:
- DDOS 7.7.3.x : Not fully confirmed
- DDOS 7.7.4.x
- DDOS 7.9.0.10
- DDOS 7.10.0.x
Causa
For any file operation like read/write, an inode structure is allocated from the dd_cache element pool.
If this cache is full and a new request comes in, then an element is evicted from this cache and the new request is fulfilled.
This eviction is based on a time policy (an element is evicted if it has not been accessed in last 'x' seconds).
In case this cache becomes too hot (all elements have been accessed within last 'x' seconds), and no elements can be evicted even after multiple retries, then fm_iget returns DD_ERR_NOMEM.
Some callers of this element pool allocation will be unable to handle the error gracefully and hence cause the FS process to PANIC and dump core should function "fm_iget" returns any error. That is why there are a few different PANIC signatures corresponding to the underlying code defect.
If this cache is full and a new request comes in, then an element is evicted from this cache and the new request is fulfilled.
This eviction is based on a time policy (an element is evicted if it has not been accessed in last 'x' seconds).
In case this cache becomes too hot (all elements have been accessed within last 'x' seconds), and no elements can be evicted even after multiple retries, then fm_iget returns DD_ERR_NOMEM.
Some callers of this element pool allocation will be unable to handle the error gracefully and hence cause the FS process to PANIC and dump core should function "fm_iget" returns any error. That is why there are a few different PANIC signatures corresponding to the underlying code defect.
Resolución
The fundamental code issue resulting in these FS process crashes is fixed using DDOS-168410 in the following versions (and all later ones in the same code branches) :
If running a version with the problem (those listed above) but you have not experienced an unexpected FS process crash yet matching the symptoms in this KB, it is our recommendation to not proactively apply the workaround, and instead, upgrade to any of the fixed releases above (or any of their successors) to avail of the latest updates and code fixes.
- DDOS 7.7.5.1
- DDOS 7.10.1.0
- DDOS 7.11.0
If running a version with the problem (those listed above) but you have not experienced an unexpected FS process crash yet matching the symptoms in this KB, it is our recommendation to not proactively apply the workaround, and instead, upgrade to any of the fixed releases above (or any of their successors) to avail of the latest updates and code fixes.
Productos afectados
Data DomainPropiedades del artículo
Número del artículo: 000207919
Tipo de artículo: Solution
Última modificación: 21 dic 2023
Versión: 17
Encuentre respuestas a sus preguntas de otros usuarios de Dell
Servicios de soporte
Compruebe si el dispositivo está cubierto por los servicios de soporte.