Data Domain: FS process PANIC in the inode cache when running out of memory in cache element pool

Resumen: A defect has been found in some recent DDOS versions (confirmed in 7.7.4, 7.9.0.10 and 7.10.0, dubious if affecting DDOS 7.7.3 as well) by which an FS process PANIC may occur in the inode cache code when, depending on the workload, a cache element pool runs out of memory for further allocations. ...

Este artículo se aplica a Este artículo no se aplica a Este artículo no está vinculado a ningún producto específico. No se identifican todas las versiones del producto en este artículo.

Síntomas

There is no degradation or advance warning for this issue, which will manifest itself in the form of an FS process failure (PANIC), after which, the process would restart and come up again fine automatically.
Due to the code path being exercised, the FS process may PANIC in several different ways, including the following:
PANIC: ddr/sm/ddfs/ddfs_mtree.c: ddfs_mtree_list: 829: !((dd_errno(e) == ENOENT) || (dd_errno(e) == DD_ERR_FM_EATTRNOENT) || (dd_errno(e) == DD_ERR_STALE))
PANIC: ddr/fv/file_verify.c: file_verify_update_marker_attrs: 4872: Fatal Error
PANIC: ddr/fv/file_verify.c: file_verify_update_snap_attr: 4446: Fatal Error
PANIC: ddr/fv/file_verify.c: file_verify_update_marker_attrs: 4860: Fatal Error
In the FS process log files (ddfs.info) the following messages will be found prior to each process crash:
01/17 20:21:59.292947 [7fbbf4f98f50] dd_cache_elem_reclaim: Evict count=256, Visited count=257, Skipped elem count=0, Skipped bucket count=0, Time threshold=1539816333626910. (99% full) Complete=True
01/17 20:22:04.662303 [7fbb031ad4f0] ERROR: FM fm_iget:355 - fm_iget failed to allocate elem in dd_cache 5001

Messages indicating the internal process full was 99% full, then unable to allocate any further elements, hence leading to process crash. 

NOTE: This issue is known to only affect the following versions:
  • DDOS 7.7.3.x : Not fully confirmed
  • DDOS 7.7.4.x
  • DDOS 7.9.0.10
  • DDOS 7.10.0.x

Causa

For any file operation like read/write, an inode structure is allocated from the dd_cache element pool.
If this cache is full and a new request comes in, then an element is evicted from this cache and the new request is fulfilled.
This eviction is based on a time policy (an element is evicted if it has not been accessed in last 'x' seconds).
In case this cache becomes too hot (all elements have been accessed within last 'x' seconds), and no elements can be evicted even after multiple retries, then fm_iget returns DD_ERR_NOMEM.
Some callers of this element pool allocation will be unable to handle the error gracefully and hence cause the FS process to PANIC and dump core should function "fm_iget" returns any error. That is why there are a few different PANIC signatures corresponding to the underlying code defect.

Resolución

The fundamental code issue resulting in these FS process crashes is fixed using DDOS-168410 in the following versions (and all later ones in the same code branches) :
  • DDOS 7.7.5.1
  • DDOS 7.10.1.0
  • DDOS 7.11.0
Customers impacted by this problem who cannot immediately upgrade to any of the releases above can try a workaround for which they need to contact Dell Support.
If running a version with the problem (those listed above) but you have not experienced an unexpected FS process crash yet matching the symptoms in this KB, it is our recommendation to not proactively apply the workaround, and instead, upgrade to any of the fixed releases above (or any of their successors) to avail of the latest updates and code fixes.

Productos afectados

Data Domain
Propiedades del artículo
Número del artículo: 000207919
Tipo de artículo: Solution
Última modificación: 21 dic 2023
Versión:  17
Encuentre respuestas a sus preguntas de otros usuarios de Dell
Servicios de soporte
Compruebe si el dispositivo está cubierto por los servicios de soporte.