Unsolved
This post is more than 5 years old
3 Posts
0
2158
November 27th, 2015 00:00
Isilon OneFS v6.0.3.10 Multiscan not complete and call for System Cancelled
Hi,
We are running Isilon OneFS v6.0.3.10 B_6_0_3_10 version in our cluster and noticed from couple of weeks that the Multiscan jobs will be called as "System Cancelled" every 8 or 12 hours. Please suggest if anyone face the same issue or any remediation.
Best, Surendra
0 events found
No Events found!


Peter_Sero
6 Operator
•
1.2K Posts
0
November 27th, 2015 06:00
Check the system log for group changes, caused by drive stalls.
Safest methods is to Smartfail (and to replace if possible) the frequently stalling drive(s).
If you want to, or have to, keep the drives, you can make the cluster somewhat tolerant to stalls:
- increase the timeout for stalls to e.g. 5 seconds
- have the cluster logging but otherwise ignoring disk stalls (no group changes)
- run AutoBalance instead of MultiScan (it is the Collect part in MultiScan that is susceptible to group changes).
Do have have access to EMC/Isilon knowledge base (KB) articles? There is more detailed information on disk stalls and group changes available.
-- Peter
surendra_kamath
3 Posts
0
December 7th, 2015 04:00
Thanks Peter. I have contacted EMC support and suggested to increase the drive stall timeout. They have mentioned that, it is known problem in our current running OneFS version and also suggested for upgrade. For workaround they suggested to increase the drive stall timeouts.
isi_sysctl_cluster hw.disk_event.thresh.slowacc_usec=3500000
I will update if I need more help from your end.
Best, Surendra
surendra_kamath
3 Posts
0
December 8th, 2015 23:00
The suggested method by EMC support team did not solve the problem and they finally suggested for upgraduation. As per your input we have stop multiscan job initiated autobalance job. I will keep you updated about the status.
BernieC
76 Posts
1
December 12th, 2015 01:00
Upgrading sounds like a very good plan, particularly with this release. Up until 7.0 was released, drive stalls would system cancel a MultiScan or Collect job. While you could use KB 89477 to try to both increase the drive stall timeout and disable group changes on drive stalls, there's still another type of drive event that will cause stalls that will ultimately cancel those jobs - ECC detection. The most recent releases of OneFS have improved this behavior considerably, as MultiScan and Collect are more resilient in the face of these events and can continue to run in many cases.