All scheduled jobs in state Waiting

Summary: Jobs are not running. All scheduled jobs are in a Waiting state.

This article applies to This article does not apply to This article is not tied to any specific product. Not all product versions are identified in this article.

Symptoms

No jobs are running. Job status shows that all jobs are in a Waiting state.
 

lifs010-13# isi job jobs list
ID   Type               State   Impact  Pri  Phase  Running Time
-----------------------------------------------------------------
1500 AutoBalanceLin     Waiting Low     4    1/3    38d 21h 51m
1662 ShadowStoreProtect Waiting Low     6    1/1    -
1712 Collect            Waiting Low     5    1/2    2d 6h 46m
1724 SnapshotDelete     Waiting Low     2    1/2    -
1725 WormQueue          Waiting Low     6    1/1    -
1726 ShadowStoreDelete  Waiting Low     2    1/1    -
1727 QuotaScan          Waiting Low     6    1/2    -
-----------------------------------------------------------------
Total: 7

Cause

This can occur if one of the nodes is disconnected from the Job Engine Coordinator:

lifs010-102# isi job status --verbose
The job engine may temporarily delay running jobs.
            Coordinator: 10
              Connected: False
     Disconnected Nodes: 8
Down or Read-Only Nodes: False
       Statistics Ready: True
    Cluster Is Degraded: False
 Run Jobs When Degraded: False
 
Running and queued jobs:
ID   Type               State   Impact  Pri  Phase  Running Time
-----------------------------------------------------------------
1500 AutoBalanceLin     Waiting Low     4    1/3    38d 21h 51m
1662 ShadowStoreProtect Waiting Low     6    1/1    -
1712 Collect            Waiting Low     5    1/2    2d 6h 46m
1724 SnapshotDelete     Waiting Low     2    1/2    -
1725 WormQueue          Waiting Low     6    1/1    -
1726 ShadowStoreDelete  Waiting Low     2    1/1    -
1727 QuotaScan          Waiting Low     6    1/2    -
-----------------------------------------------------------------
Total: 7
 
Recent finished jobs:
ID   Type           State          Time
------------------------------------------------------
1721 SnapshotDelete Succeeded      2016-04-21T11:00:20
1663 MultiScan      User Cancelled 2016-04-22T15:35:08
1722 SnapshotDelete Succeeded      2016-04-22T17:25:29
1723 WormQueue      Succeeded      2016-04-22T17:25:55
------------------------------------------------------
Total: 4

 

Resolution

Confirm the Logical Node Number (LNN) of the disconnected node. The node LNN may not always match the node ID.

# isi_nodes %{id} %{node} %{lnn} %{address}


Example output: 

lifs010-2# isi_nodes %{id} %{node} %{lnn} %{address}
1 lifs010-1 1 192.168.41.101
2 lifs010-2 2 192.168.41.102
3 lifs010-3 3 192.168.41.103
4 lifs010-4 4 192.168.41.104
5 lifs010-5 5 192.168.41.105
6 lifs010-6 6 192.168.41.106
7 lifs010-7 7 192.168.41.107
8 lifs010-8 8 192.168.41.108
9 lifs010-9 9 192.168.41.109
10 lifs010-10 10 192.168.41.110
11 lifs010-11 11 192.168.41.111
12 lifs010-13 12 192.168.41.112

 

Check if all nodes have the isi_mcp process running:

# isi_for_array -s ps auxw | grep mcp | grep -v grep


Example output: (observe node 8 is not listed)

lifs010-2# isi_for_array -s ps auxw | grep mcp | grep -v grep
lifs010-1: root    1690   0.0  0.1  48708  18248  -  Is   Sat09       0:00.01 isi_mcp: failsafe (isi_mcp)
lifs010-1: root    1692   0.0  0.1  59968  18212  -  Is   Sat09       0:00.40 isi_mcp: forker (isi_mcp)
lifs010-1: root    1910   0.0  0.3 101728  31272  -  Ss   Sat09      44:23.35 isi_mcp: master (isi_mcp)
lifs010-2: root    1751   0.0  0.1  53060  18228  -  Is   12Jun25      0:00.11 isi_mcp: failsafe (isi_mcp)
lifs010-2: root    1816   0.0  0.1  72896  18160  -  Is   12Jun25      0:00.58 isi_mcp: forker (isi_mcp)
lifs010-2: root    1901   0.0  0.3  86140  31368  -  Ss   12Jun25    148:00.09 isi_mcp: master (isi_mcp)
lifs010-3: root    1681   0.0  0.1  78532  18228  -  Is   Sat09       0:00.01 isi_mcp: failsafe (isi_mcp)
lifs010-3: root    1683   0.0  0.1  55616  18172  -  Is   Sat09       0:05.67 isi_mcp: forker (isi_mcp)
lifs010-3: root    1678   0.0  0.3 104324  31652  -  Ss   Sat09      46:12.73 isi_mcp: master (isi_mcp)
lifs010-4: root    1691   0.0  0.1  48708  18248  -  Is   Sat09       0:00.01 isi_mcp: failsafe (isi_mcp)
lifs010-4: root    1643   0.0  0.1  59968  18212  -  Is   Sat09       0:00.40 isi_mcp: forker (isi_mcp)
lifs010-4: root    1312   0.0  0.3 101728  31272  -  Ss   Sat09      44:23.35 isi_mcp: master (isi_mcp)
lifs010-5: root    1755   0.0  0.1  53060  18228  -  Is   12Jun25      0:00.12 isi_mcp: failsafe (isi_mcp)
lifs010-5: root    1256   0.0  0.1  72896  18160  -  Is   12Jun25      0:00.58 isi_mcp: forker (isi_mcp)
lifs010-5: root    1967   0.0  0.3  86140  31368  -  Ss   12Jun25    148:00.09 isi_mcp: master (isi_mcp)
lifs010-6: root    3456   0.0  0.1  78532  18228  -  Is   Sat09       0:00.01 isi_mcp: failsafe (isi_mcp)
lifs010-6: root    2754   0.0  0.1  55616  18172  -  Is   Sat09       0:05.67 isi_mcp: forker (isi_mcp)
lifs010-6: root    1923   0.0  0.3 104324  31652  -  Ss   Sat09      46:12.73 isi_mcp: master (isi_mcp)
lifs010-7: root    1888   0.0  0.1  48708  18248  -  Is   Sat09       0:00.01 isi_mcp: failsafe (isi_mcp)
lifs010-7: root    3654   0.0  0.1  59968  18212  -  Is   Sat09       0:00.40 isi_mcp: forker (isi_mcp)
lifs010-7: root    1236   0.0  0.3 101728  31272  -  Ss   Sat09      44:23.35 isi_mcp: master (isi_mcp)
lifs010-9: root    1030   0.0  0.1  78532  18228  -  Is   Sat09       0:00.01 isi_mcp: failsafe (isi_mcp)
lifs010-9: root    1601   0.0  0.1  55616  18172  -  Is   Sat09       0:05.67 isi_mcp: forker (isi_mcp)
lifs010-9: root    1922   0.0  0.3 104324  31652  -  Ss   Sat09      46:12.73 isi_mcp: master (isi_mcp)
lifs010-10: root    1599   0.0  0.1  48708  18248  -  Is   Sat09       0:00.01 isi_mcp: failsafe (isi_mcp)
lifs010-10: root    1633   0.0  0.1  59968  18212  -  Is   Sat09       0:00.40 isi_mcp: forker (isi_mcp)
lifs010-10: root    1933   0.0  0.3 101728  31272  -  Ss   Sat09      44:23.35 isi_mcp: master (isi_mcp)

 

Start isi_mcp on the nodes which do not have isi_mcp running:

# isi_for_array -n 8 isi_mcp

 

Verify the state of scheduled jobs:

# isi job status --verbose
The job engine is running.
            Coordinator: 2
              Connected: True
     Disconnected Nodes: -
Down or Read-Only Nodes: False
       Statistics Ready: True
    Cluster Is Degraded: False
 Run Jobs When Degraded: False
 
Running and queued jobs:
ID   Type               State   Impact  Pri  Phase  Running Time
-----------------------------------------------------------------
1500 AutoBalanceLin     Running Low     4    1/3    38d 21h 51m
1662 ShadowStoreProtect Waiting Low     6    1/1    -
1712 Collect            Waiting Low     5    1/2    2d 6h 46m
1724 SnapshotDelete     Running Low     2    1/2    3s
1725 WormQueue          Waiting Low     6    1/1    -
1726 ShadowStoreDelete  Running Low     2    1/1    2s
1727 QuotaScan          Waiting Low     6    1/2    -
-----------------------------------------------------------------
Total: 7


This issue can also occur if a node is split, offline, panicked, Read-Only or unresponsive causing nodes to appear disconnected from the Job Engine Coordinator. Further troubleshooting may be required to get the node back into a healthy state. If assistance is needed, contact Dell Technical Support.

Affected Products

Isilon
Article Properties
Article Number: 000017115
Article Type: Solution
Last Modified: 10 Sep 2025
Version:  5
Find answers to your questions from other Dell users
Support Services
Check if your device is covered by Support Services.