Data Domain: BoostFS Panics or Mount Point becomes Unresponsive

Zusammenfassung: BoostFS Panics or Crashes or the mount point becomes unresponsive when the Backup Application performs I/O operations on the BoostFS mount point.

Dieser Artikel gilt für Dieser Artikel gilt nicht für Dieser Artikel ist nicht an ein bestimmtes Produkt gebunden. In diesem Artikel werden nicht alle Produktversionen aufgeführt.

Symptome

Symptoms:

  • BoostFS crashes and panics randomly. This happens when the workload is high.
  • The backup application performs delayed READ and WRITE operations on the BoostFS mount point.
  • When there are delays in READ and WRITE operations from the backup application, BoostFS logs shows this error 5057 File handle is stale from Data Domain server.

BoostFS Logs:

###### WRITE operation FAILED due to STALE FILE HANDLE error ######
Sep 11 00:36:29.635 7884 10740 [E] [ddp log] [1ECC:29F4] ddcl_ddcp_send_file_loop: Call to recv
refs2 failed. [ERR=5057] Sep 11 00:36:29.635 7884 10740 [E] [ddp log] [1ECC:29F4] ddcl_ddcp_pwrite: Call to send file loop2 failed. [ERR=5057]
Sep 11 00:36:29.635 7884 10740 [E] [ddp log] [1ECC:29F4] ddcl_ddcp_pwrite: Error in ddcl ddcp
pwrite. [ERR=5057] Sep 11 00:36:29.635 7884 10740 [E] [ddp log] [1ECC:29F4] ddp_write() failed Offset 11010048, BytesToWrite 1048576, BytesWritten 0 Err: 5057-File handle is stale
Sep 11 00:36:29.635 7884 10740 [E] bfs_cache_flush: failed: 5057 File handle is stale (0 bytes written)
Sep 11 00:36:29.651 7884 11292 [E] [ddp log] [1ECC:2C1C] ddcl_ddcp_send_file_loop: Call to recv
refs2 failed. [ERR=5057]
####### PANIC occurred HERE #######
Sep 11 00:36:29.667 7884 11292 [E] [ddp log] [1ECC:2C1C] PANIC: ..\ddcl\ddcl_ddcp.c:
ddcl_ddcp_commit: 4541: !(c->send_offset == c->write_offset)

Triaging:

  1. Check the BoostFS, and Server-side DDFS logs for the error reported above.
  2. Observe the delay in READ and WRITE operations from the backup application. Consider BoostFS API entry and exit logs.
  3. Check the value set for OST_ABANDON_TIMEOUT (default three hours).
  4. Check that the RPC's timeout is greater than the value set for OST_ABANDON_TIMEOUT.

Ursache

  1. The issue is due to a delay in READ and WRITE operations by the backup application. This triggers the DDFS to discard timeout. This leads to closure of the corresponding file handle after three hours (default value)
  2. This is the Default discard Timeout period used by DDFS to identify inactive file handle. Eventually, new writes may experience panic on the client side.

Lösung

Contact Dell Support to increase the timeout which requires access to bash.

Change the OST_ABANDON_TIMEOUT parameter on the server side using the steps below. Its value can be set to a larger value with a maximum of 12 hours.

 

Note: The file system must be disabled and enabled as part of applying the solution.

 

  1. Log in to the Data Domain with admin role access. Support enters bash mode and then enters se mode with ddsh -s
  2. Check the original OST_ABANDON_TIMEOUT value.
SE@dd## se sysparam show OST_ABANDON_TIMEOUT
  1. Increase the OST_ABANDON_TIMEOUT value. By default, the value is 10800 (three hours).
SE@dd## se sysparam set OST_ABANDON_TIMEOUT=129600
SE@dd## se sysparam show OST_ABANDON_TIMEOUT

Name Description Current Default Override
------------------- --------------------------------- ------- ------- -----
---
OST_ABANDON_TIMEOUT DDCP abandon context timeout(sec) 129600 10800 rpc
------------------- --------------------------------- ------- ------- -----
---

SE@dd##priv set admin
  1. Restart file system after the above parameter change. Confirm with the customer if this restart is permitted. If not, schedule a maintenance period to run the following command:
SE@dd## filesys disable
SE@dd## filesys enable

Betroffene Produkte

Data Domain
Artikeleigenschaften
Artikelnummer: 000215706
Artikeltyp: Solution
Zuletzt geändert: 14 Jan. 2026
Version:  4
Antworten auf Ihre Fragen erhalten Sie von anderen Dell NutzerInnen
Support Services
Prüfen Sie, ob Ihr Gerät durch Support Services abgedeckt ist.