Data Domain: BoostFS Panics or Mount Point becomes Unresponsive
Zusammenfassung: BoostFS Panics or Crashes or the mount point becomes unresponsive when the Backup Application performs I/O operations on the BoostFS mount point.
Dieser Artikel gilt für
Dieser Artikel gilt nicht für
Dieser Artikel ist nicht an ein bestimmtes Produkt gebunden.
In diesem Artikel werden nicht alle Produktversionen aufgeführt.
Symptome
Symptoms:
- BoostFS crashes and panics randomly. This happens when the workload is high.
- The backup application performs delayed
READandWRITEoperations on the BoostFS mount point. - When there are delays in
READandWRITEoperations from the backup application, BoostFS logs shows thiserror 5057 File handle is stale from Data Domain server.
BoostFS Logs:
###### WRITE operation FAILED due to STALE FILE HANDLE error ###### Sep 11 00:36:29.635 7884 10740 [E] [ddp log] [1ECC:29F4] ddcl_ddcp_send_file_loop: Call to recv refs2 failed. [ERR=5057] Sep 11 00:36:29.635 7884 10740 [E] [ddp log] [1ECC:29F4] ddcl_ddcp_pwrite: Call to send file loop2 failed. [ERR=5057] Sep 11 00:36:29.635 7884 10740 [E] [ddp log] [1ECC:29F4] ddcl_ddcp_pwrite: Error in ddcl ddcp pwrite. [ERR=5057] Sep 11 00:36:29.635 7884 10740 [E] [ddp log] [1ECC:29F4] ddp_write() failed Offset 11010048, BytesToWrite 1048576, BytesWritten 0 Err: 5057-File handle is stale Sep 11 00:36:29.635 7884 10740 [E] bfs_cache_flush: failed: 5057 File handle is stale (0 bytes written) Sep 11 00:36:29.651 7884 11292 [E] [ddp log] [1ECC:2C1C] ddcl_ddcp_send_file_loop: Call to recv refs2 failed. [ERR=5057] ####### PANIC occurred HERE ####### Sep 11 00:36:29.667 7884 11292 [E] [ddp log] [1ECC:2C1C] PANIC: ..\ddcl\ddcl_ddcp.c: ddcl_ddcp_commit: 4541: !(c->send_offset == c->write_offset)
Triaging:
- Check the BoostFS, and Server-side DDFS logs for the error reported above.
- Observe the delay in
READandWRITEoperations from the backup application. Consider BoostFS API entry and exit logs. - Check the value set for
OST_ABANDON_TIMEOUT(default three hours). - Check that the RPC's timeout is greater than the value set for
OST_ABANDON_TIMEOUT.
Ursache
- The issue is due to a delay in
READandWRITEoperations by the backup application. This triggers the DDFS to discard timeout. This leads to closure of the corresponding file handle after three hours (default value) - This is the Default discard Timeout period used by DDFS to identify inactive file handle. Eventually, new writes may experience panic on the client side.
Lösung
Contact Dell Support to increase the timeout which requires access to bash.
Change the OST_ABANDON_TIMEOUT parameter on the server side using the steps below. Its value can be set to a larger value with a maximum of 12 hours.
Note: The file system must be disabled and enabled as part of applying the solution.
- Log in to the Data Domain with admin role access. Support enters bash mode and then enters
semode withddsh -s - Check the original
OST_ABANDON_TIMEOUTvalue.
SE@dd## se sysparam show OST_ABANDON_TIMEOUT
- Increase the
OST_ABANDON_TIMEOUTvalue. By default, the value is 10800 (three hours).
SE@dd## se sysparam set OST_ABANDON_TIMEOUT=129600 SE@dd## se sysparam show OST_ABANDON_TIMEOUT Name Description Current Default Override ------------------- --------------------------------- ------- ------- ----- --- OST_ABANDON_TIMEOUT DDCP abandon context timeout(sec) 129600 10800 rpc ------------------- --------------------------------- ------- ------- ----- --- SE@dd##priv set admin
- Restart file system after the above parameter change. Confirm with the customer if this restart is permitted. If not, schedule a maintenance period to run the following command:
SE@dd## filesys disable SE@dd## filesys enable
Betroffene Produkte
Data DomainArtikeleigenschaften
Artikelnummer: 000215706
Artikeltyp: Solution
Zuletzt geändert: 14 Jan. 2026
Version: 4
Antworten auf Ihre Fragen erhalten Sie von anderen Dell NutzerInnen
Support Services
Prüfen Sie, ob Ihr Gerät durch Support Services abgedeckt ist.