PowerScale (Isilon): Child or Parent isi_hangdump process not running on a single or multiple nodes. (Gen5, Gen6, Gen6.5)

Shrnutí: This article provides an overview of how to resolve issues with isi_hangdump messages spamming in /var/log/messages. Summary: Child or Parent isi_hangdump process not running on a single or multiple nodes. For isi_hangdump to work properly, both parent and child process needs to be running. ...

Tento článek se vztahuje na Tento článek se nevztahuje na Tento článek není vázán na žádný konkrétní produkt. V tomto článku nejsou uvedeny všechny verze produktu.

Podívejte se na další zdroje

Příznaky

Multiple nodes report ping timeouts, possibly to one specific node.
NOTE: This is not for RBM ping timeouts

Problematic node show symptoms of a continual isi_hangdump loop.
Major isi_hangdumps occurs roughly the same time every hour.

This could also be causing performance issues.

Similar messages in /var/log/messages:

2021-04-04T01:30:50-04:00 <1.5> CLUSTER-24 isi_hangdump: Triggering clusterwide hangdump
2021-04-04T01:30:50-04:00 <1.5> CLUSTER-24 isi_hangdump: LOCK TIMEOUT AT 1617514250 UTC
2021-04-04T01:30:50-04:00 <1.5> CLUSTER-24 isi_hangdump: Hangdump after 752602 seconds: Ping timeout
2021-04-04T01:31:00-04:00 <1.5> CLUSTER-24 isi_hangdump: END OF DUMP AT 1617514250 UTC
2021-04-04T01:31:00-04:00 <1.5> CLUSTER-24 isi_hangdump: Initiating hangdump on 26 nodes...
2021-04-04T01:31:09-04:00 <1.5> CLUSTER-24 isi_hangdump: Skipping requested dump(Ping timeout)
2021-04-04T01:32:09-04:00 <1.5> CLUSTER-24 isi_hangdump: Skipping requested dump(Ping timeout)
2021-04-04T01:35:12-04:00 <1.5> CLUSTER-24 isi_hangdump: Skipping requested dump(Ping timeout)
2021-04-04T01:36:13-04:00 <1.5> CLUSTER-24 isi_hangdump: Skipping requested dump(Ping timeout)
2021-04-04T01:52:27-04:00 <1.5> CLUSTER-24 isi_hangdump: Skipping requested dump(Ping timeout)
2021-04-04T01:53:28-04:00 <1.5> CLUSTER-24 isi_hangdump: Skipping requested dump(Ping timeout)

The node 2 is triggering the hangdump and the difference is one hour
2020-08-20T00:53:48-07:00 <1.5> CLUSTER-2 isi_hangdump: Triggering clusterwide hangdump
2020-08-20T01:53:49-07:00 <1.5> CLUSTER-2 isi_hangdump: Triggering clusterwide hangdump <-- 1 hour difference between the hangdumps: 1:53 and 0:53
2020-08-20T02:53:49-07:00 <1.5> CLUSTER-2 isi_hangdump: Triggering clusterwide hangdump

or

Only the node 24 is triggering the hangdumps and the frequency is one hour:

CLUSTER-24# isi_for_array "grep -i triggering /var/log/messages | grep 2021-04"
CLUSTER-24:2021-04-01T00:30:12-04:00 <1.5> CLUSTER-24 isi_hangdump: Triggering clusterwide hangdump
CLUSTER-24:2021-04-01T01:30:12-04:00 <1.5> CLUSTER-24 isi_hangdump: Triggering clusterwide hangdump <-- 01:30:12 and 00:30:12 : one hour difference from the previous instan
CLUSTER-24:2021-04-01T02:30:12-04:00 <1.5> CLUSTER-24 isi_hangdump: Triggering clusterwide hangdump

The number of isi_hangdump processes can be 4 or 1.The expected number of isi_hangdump processes should be 2. To see how many isi_hangdump processes are running on each node:

# isi_for_array -s "ps awux | grep '[h]angdump'"

Resolution is to restart isi_hangdump service and check for the number of isi_hangdump processes.
If it’s not 2 then restart the node itself.

Příčina

Parent or Child process of isi_hangdump is not running. If the child (ping) process is not running, then that node will not send the internal ping messages which will result in hangdumps being triggered. This could potentially lead to performance issues due to the continuous generation of hangdumps.

Řešení

Currently the resolution is to run "isi_hangdump restart" (as shown in the example below).

If that fails, panic reboot the node to get the cores and restart the isi_hangdump process.

CLUSTER-1# ps -auwx | grep -i isi_hangdump
root 1015 0.0 0.6 437876 38928 - S 25Mar21 0:57.01 /usr/libexec/isilon/isi_hangdump /usr/bin/isi_hangdump start
root 1016 0.0 0.5 398676 32200 - S 25Mar21 20:05.60 /usr/libexec/isilon/isi_hangdump /usr/bin/isi_hangdump start
root 32228 0.0 0.0 12344 2616 0 S+ 20:41 0:00.00 grep -i isi_hangdump
CLUSTER-1# isi_hangdump restart
CLUSTER-1# ps -auwx | grep -i isi_hangdump
root 32253 3.9 0.6 398808 35976 - S 20:41 0:00.01 /usr/libexec/isilon/isi_hangdump /usr/bin/isi_hangdump restart
root 1016 0.0 0.5 398676 32200 - S 25Mar21 20:05.61 /usr/libexec/isilon/isi_hangdump /usr/bin/isi_hangdump start
root 32260 0.0 0.0 12344 2616 0 S+ 20:41 0:00.00 grep -i isi_hangdump

In the meantime, engineering is working on a full time resolution.

Dotčené produkty

PowerScale OneFS

Číslo článku: 000185607

Typ článku: Solution

Poslední úprava: 12 led 2023

Verze: 6

Zkontrolujte, zda se na vaše zařízení vztahují služby podpory.

PowerScale (Isilon): Child or Parent isi_hangdump process not running on a single or multiple nodes. (Gen5, Gen6, Gen6.5)

Příznaky

Příčina

Řešení

Dotčené produkty

Vlastnosti článku

Najděte odpovědi na své otázky od ostatních uživatelů společnosti Dell

Služby podpory

Vlastnosti článku

Najděte odpovědi na své otázky od ostatních uživatelů společnosti Dell

Služby podpory

PowerScale (Isilon): Child or Parent isi_hangdump process not running on a single or multiple nodes. (Gen5, Gen6, Gen6.5)

Podrobný článek

Příznaky

Příčina

Řešení

Dotčené produkty

Příznaky

Příčina

Řešení

Dotčené produkty

Vlastnosti článku

Najděte odpovědi na své otázky od ostatních uživatelů společnosti Dell

Služby podpory

Vlastnosti článku

Najděte odpovědi na své otázky od ostatních uživatelů společnosti Dell

Služby podpory