Isilon:OneFS - 如何解譯監視程式錯誤
Summary: 軟體監視器是監視內核並列印堆疊或在節點無回應時重新啟動節點的過程。如此可保護叢集免於嚴重 CPU 不足的症狀,並協助 Dell 技術支援識別問題,以便進行修正。
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Instructions
簡介
本知識文章介紹如何讀取和解譯由 swatchdog 程序建立的堆疊。軟體監視程式也稱為監視程式或軟體監視程式。
詳細資料
有時,節點會將堆疊寫入 /var/log/messages 檔案中,或自行重新開機,並顯示類似以下的錯誤:
********************************************** Software Watchdog failed (userspace is starved!) ********************************************** ********************************************** Software Watchdog failed on CPU 0 (6353: kt: gmp-split [-]) 0x80bda7b9 -> 0x80bda5dc (fp=0xf734bb78): lk_fail_create_entry_and_owner 0x80bbe950 -> 0x80bbe7e0 (fp=0xf734bbf0): lkf_group_change_save_locks 0x80aa251c -> 0x80aa2268 (fp=0xf734bc2c): rtxn_sync_locks_prepare 0x80aa447d -> 0x80aa4304 (fp=0xf734bcdc): rtxn_split 0x80aac9cf -> 0x80aac8ec (fp=0xf734bcfc): kt_main 0x802a9d43 -> 0x802a9ca8 (fp=0xf734bd14): fork_exit intr counts: irq3: 1382 irq4: 1164845 irq14: 19331 irq17: 10672321 irq18: 11 stray: 1 irq24: 22011026 irq48: 46902637 ********************************************** panic @ time 1257444527.664: Software watchdog timed out Stack: ------------------------------------------------- 0x802e24f0 -> 0x802e24e4 (fp=0xf734ba78): isi_swatchdog_panic 0x802e27d7 -> 0x802e26ac (fp=0xf734ba8c): isi_swatchdog_hardclock 0x80295187 -> 0x80295068 (fp=0xf734bab0): hardclock_process 0x802951ba -> 0x802951a8 (fp=0xf734bac4): hardclock 0x8041d608 -> 0x8041d5b8 (fp=0xf734bad4): lapic_handle_timer 0x804281c3 -> 0x804281a4 (fp=0xf734bb78): bcmp 0x80bbe950 -> 0x80bbe7e0 (fp=0xf734bbf0): lkf_group_change_save_locks 0x80aa251c -> 0x80aa2268 (fp=0xf734bc2c): rtxn_sync_locks_prepare 0x80aa447d -> 0x80aa4304 (fp=0xf734bcdc): rtxn_split 0x80aac9cf -> 0x80aac8ec (fp=0xf734bcfc): kt_main 0x802a9d43 -> 0x802a9ca8 (fp=0xf734bd14): fork_exit ---------------------------------------------------------
看門狗的構建方式如下:
- 低電平定時器中斷每 10 秒觸發一次。
- 高級用戶空間代碼嘗試每 5 秒為計時器中斷設置郵箱。
當低級計時器中斷無法從用戶空間找到郵箱註釋時,將執行操作,然後轉儲堆疊。連續四次失敗后,群集將重新啟動。
如需解譯錯誤堆疊或 Swatchdog 觸發的重新開機協助,請聯絡 Dell 技術支援部門。
Affected Products
IsilonProducts
Isilon, PowerScale OneFSArticle Properties
Article Number: 000018976
Article Type: How To
Last Modified: 10 Jun 2025
Version: 6
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.