All scheduled jobs in state Waiting
Summary: Jobs are not running. All scheduled jobs are in a Waiting state.
This article applies to
This article does not apply to
This article is not tied to any specific product.
Not all product versions are identified in this article.
Symptoms
No jobs are running. Job status shows that all jobs are in a Waiting state.
lifs010-13# isi job jobs list ID Type State Impact Pri Phase Running Time ----------------------------------------------------------------- 1500 AutoBalanceLin Waiting Low 4 1/3 38d 21h 51m 1662 ShadowStoreProtect Waiting Low 6 1/1 - 1712 Collect Waiting Low 5 1/2 2d 6h 46m 1724 SnapshotDelete Waiting Low 2 1/2 - 1725 WormQueue Waiting Low 6 1/1 - 1726 ShadowStoreDelete Waiting Low 2 1/1 - 1727 QuotaScan Waiting Low 6 1/2 - ----------------------------------------------------------------- Total: 7
Cause
This can occur if one of the nodes is disconnected from the Job Engine Coordinator:
lifs010-102# isi job status --verbose
The job engine may temporarily delay running jobs.
Coordinator: 10
Connected: False
Disconnected Nodes: 8
Down or Read-Only Nodes: False
Statistics Ready: True
Cluster Is Degraded: False
Run Jobs When Degraded: False
Running and queued jobs:
ID Type State Impact Pri Phase Running Time
-----------------------------------------------------------------
1500 AutoBalanceLin Waiting Low 4 1/3 38d 21h 51m
1662 ShadowStoreProtect Waiting Low 6 1/1 -
1712 Collect Waiting Low 5 1/2 2d 6h 46m
1724 SnapshotDelete Waiting Low 2 1/2 -
1725 WormQueue Waiting Low 6 1/1 -
1726 ShadowStoreDelete Waiting Low 2 1/1 -
1727 QuotaScan Waiting Low 6 1/2 -
-----------------------------------------------------------------
Total: 7
Recent finished jobs:
ID Type State Time
------------------------------------------------------
1721 SnapshotDelete Succeeded 2016-04-21T11:00:20
1663 MultiScan User Cancelled 2016-04-22T15:35:08
1722 SnapshotDelete Succeeded 2016-04-22T17:25:29
1723 WormQueue Succeeded 2016-04-22T17:25:55
------------------------------------------------------
Total: 4
Resolution
Confirm the Logical Node Number (LNN) of the disconnected node. The node LNN may not always match the node ID.
# isi_nodes %{id} %{node} %{lnn} %{address}
Example output:
lifs010-2# isi_nodes %{id} %{node} %{lnn} %{address}
1 lifs010-1 1 192.168.41.101
2 lifs010-2 2 192.168.41.102
3 lifs010-3 3 192.168.41.103
4 lifs010-4 4 192.168.41.104
5 lifs010-5 5 192.168.41.105
6 lifs010-6 6 192.168.41.106
7 lifs010-7 7 192.168.41.107
8 lifs010-8 8 192.168.41.108
9 lifs010-9 9 192.168.41.109
10 lifs010-10 10 192.168.41.110
11 lifs010-11 11 192.168.41.111
12 lifs010-13 12 192.168.41.112
Check if all nodes have the isi_mcp process running:
# isi_for_array -s ps auxw | grep mcp | grep -v grep
Example output: (observe node 8 is not listed)
lifs010-2# isi_for_array -s ps auxw | grep mcp | grep -v grep lifs010-1: root 1690 0.0 0.1 48708 18248 - Is Sat09 0:00.01 isi_mcp: failsafe (isi_mcp) lifs010-1: root 1692 0.0 0.1 59968 18212 - Is Sat09 0:00.40 isi_mcp: forker (isi_mcp) lifs010-1: root 1910 0.0 0.3 101728 31272 - Ss Sat09 44:23.35 isi_mcp: master (isi_mcp) lifs010-2: root 1751 0.0 0.1 53060 18228 - Is 12Jun25 0:00.11 isi_mcp: failsafe (isi_mcp) lifs010-2: root 1816 0.0 0.1 72896 18160 - Is 12Jun25 0:00.58 isi_mcp: forker (isi_mcp) lifs010-2: root 1901 0.0 0.3 86140 31368 - Ss 12Jun25 148:00.09 isi_mcp: master (isi_mcp) lifs010-3: root 1681 0.0 0.1 78532 18228 - Is Sat09 0:00.01 isi_mcp: failsafe (isi_mcp) lifs010-3: root 1683 0.0 0.1 55616 18172 - Is Sat09 0:05.67 isi_mcp: forker (isi_mcp) lifs010-3: root 1678 0.0 0.3 104324 31652 - Ss Sat09 46:12.73 isi_mcp: master (isi_mcp) lifs010-4: root 1691 0.0 0.1 48708 18248 - Is Sat09 0:00.01 isi_mcp: failsafe (isi_mcp) lifs010-4: root 1643 0.0 0.1 59968 18212 - Is Sat09 0:00.40 isi_mcp: forker (isi_mcp) lifs010-4: root 1312 0.0 0.3 101728 31272 - Ss Sat09 44:23.35 isi_mcp: master (isi_mcp) lifs010-5: root 1755 0.0 0.1 53060 18228 - Is 12Jun25 0:00.12 isi_mcp: failsafe (isi_mcp) lifs010-5: root 1256 0.0 0.1 72896 18160 - Is 12Jun25 0:00.58 isi_mcp: forker (isi_mcp) lifs010-5: root 1967 0.0 0.3 86140 31368 - Ss 12Jun25 148:00.09 isi_mcp: master (isi_mcp) lifs010-6: root 3456 0.0 0.1 78532 18228 - Is Sat09 0:00.01 isi_mcp: failsafe (isi_mcp) lifs010-6: root 2754 0.0 0.1 55616 18172 - Is Sat09 0:05.67 isi_mcp: forker (isi_mcp) lifs010-6: root 1923 0.0 0.3 104324 31652 - Ss Sat09 46:12.73 isi_mcp: master (isi_mcp) lifs010-7: root 1888 0.0 0.1 48708 18248 - Is Sat09 0:00.01 isi_mcp: failsafe (isi_mcp) lifs010-7: root 3654 0.0 0.1 59968 18212 - Is Sat09 0:00.40 isi_mcp: forker (isi_mcp) lifs010-7: root 1236 0.0 0.3 101728 31272 - Ss Sat09 44:23.35 isi_mcp: master (isi_mcp) lifs010-9: root 1030 0.0 0.1 78532 18228 - Is Sat09 0:00.01 isi_mcp: failsafe (isi_mcp) lifs010-9: root 1601 0.0 0.1 55616 18172 - Is Sat09 0:05.67 isi_mcp: forker (isi_mcp) lifs010-9: root 1922 0.0 0.3 104324 31652 - Ss Sat09 46:12.73 isi_mcp: master (isi_mcp) lifs010-10: root 1599 0.0 0.1 48708 18248 - Is Sat09 0:00.01 isi_mcp: failsafe (isi_mcp) lifs010-10: root 1633 0.0 0.1 59968 18212 - Is Sat09 0:00.40 isi_mcp: forker (isi_mcp) lifs010-10: root 1933 0.0 0.3 101728 31272 - Ss Sat09 44:23.35 isi_mcp: master (isi_mcp)
Start isi_mcp on the nodes which do not have isi_mcp running:
# isi_for_array -n 8 isi_mcp
Verify the state of scheduled jobs:
# isi job status --verbose The job engine is running. Coordinator: 2 Connected: True Disconnected Nodes: - Down or Read-Only Nodes: False Statistics Ready: True Cluster Is Degraded: False Run Jobs When Degraded: False Running and queued jobs: ID Type State Impact Pri Phase Running Time ----------------------------------------------------------------- 1500 AutoBalanceLin Running Low 4 1/3 38d 21h 51m 1662 ShadowStoreProtect Waiting Low 6 1/1 - 1712 Collect Waiting Low 5 1/2 2d 6h 46m 1724 SnapshotDelete Running Low 2 1/2 3s 1725 WormQueue Waiting Low 6 1/1 - 1726 ShadowStoreDelete Running Low 2 1/1 2s 1727 QuotaScan Waiting Low 6 1/2 - ----------------------------------------------------------------- Total: 7
This issue can also occur if a node is split, offline, panicked, Read-Only or unresponsive causing nodes to appear disconnected from the Job Engine Coordinator. Further troubleshooting may be required to get the node back into a healthy state. If assistance is needed, contact Dell Technical Support.
Affected Products
IsilonArticle Properties
Article Number: 000017115
Article Type: Solution
Last Modified: 10 Sep 2025
Version: 5
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.