Unsolved

This post is more than 5 years old

2158

November 27th, 2015 00:00

Isilon OneFS v6.0.3.10 Multiscan not complete and call for System Cancelled

Hi,

We are running Isilon OneFS v6.0.3.10 B_6_0_3_10 version in our cluster and noticed from couple of weeks that the Multiscan jobs will be called as "System Cancelled" every 8 or 12 hours. Please suggest if anyone face the same issue or any remediation.

Best, Surendra

6 Operator

 • 

1.2K Posts

November 27th, 2015 06:00

Check the system log for group changes, caused by drive stalls.

Safest methods is to Smartfail (and to replace if possible) the frequently stalling drive(s).

If you want to, or have to, keep the drives, you can make the cluster somewhat tolerant to stalls:

- increase the timeout for stalls to e.g. 5 seconds

- have the cluster logging but otherwise ignoring disk stalls (no group changes)

- run AutoBalance instead of MultiScan (it is the Collect part in MultiScan that is susceptible to group changes).


Do have have access to EMC/Isilon knowledge base (KB) articles? There is more detailed information on disk stalls and group changes available.


-- Peter

December 7th, 2015 04:00

Thanks Peter. I have contacted EMC support and suggested to increase the drive stall  timeout. They have mentioned that, it is known problem in our current running OneFS version and also suggested for upgrade. For workaround they suggested to increase the drive stall timeouts.

isi_sysctl_cluster hw.disk_event.thresh.slowacc_usec=3500000


I will update if I need more help from your end.

Best, Surendra



December 8th, 2015 23:00

The suggested method by EMC support team did not solve the problem and they finally suggested for  upgraduation.  As per your input we have stop multiscan job initiated  autobalance job. I will keep you updated about the status.

76 Posts

December 12th, 2015 01:00

surendra_kamath wrote:

The suggested method by EMC support team did not solve the problem and they finally suggested for  upgraduation.  As per your input we have stop multiscan job initiated  autobalance job. I will keep you updated about the status.

Upgrading sounds like a very good plan, particularly with this release. Up until 7.0 was released, drive stalls would system cancel a MultiScan or Collect job. While you could use KB 89477 to try to both increase the drive stall timeout and disable group changes on drive stalls, there's still another type of drive event that will cause stalls that will ultimately cancel those jobs - ECC detection. The most recent releases of OneFS have improved this behavior considerably, as MultiScan and Collect are more resilient in the face of these events and can continue to run in many cases.

0 events found

No Events found!

Top