We had recently upgrade from OneFS 18.104.22.168 to 22.214.171.124 and encountered few issues and weirdness.
1. SyncIQ jobs are intermittently failing on both clusters.
2. WebUI timeout very quick
3. Non-stop log entries in /var/log/messages
4. InsightIQ alert for Isilon data retrieval delayed
#1 was resolved. Still have open SRs on the #2 & #3.
In 126.96.36.199 when ever Multiscan job run the nodes in the cluster splits. TSE suggested to run disable the multiscan job and schedule the Collect and Autobalance Jobs with low impact. Is there any patches to fix this issue in 8.10.2 or do we need to upgrade to latest code 8.1.2. But the latest code is released in AUG/2018. Is it good/worth to upgrade to latest code. Any suggestions?
If anyone is using Changelist API in their backup solution, It does not work in the version. I heard the issue was fixed in 8.2.1, just wish there was just a fix to fix the changelist API, or in 8.1.3. It's not documented in the release notes.
We recently upgraded a DR cluster to 8.1.2 plus patches from August. No big issues until we upgraded the Prod cluster. We have one access zone we fail over regularly and it failed over fine, then we upgraded the prod cluster and after that the synciq jobs are taking 30-40 times as long to complete.
for example, prior to the upgrade - Job 1234 copied 96,000 files , but only took 7m back on 7/26
after the upgrade, job 5678 copied 91,000 files and took 2H43m on 9/22.
Jobs that used to take 10-15 minutes take 2-5 hours, so effectively we can't fail back without a huge downtime.
We upgraded to 188.8.131.52 because of the LWIO issue in 184.108.40.206 described to us as:
“It has to do with kerberos authentication and not releasing k5_mutex_init threads properly. The LWIO memory utilization will continue to grow until LWIO reaches the maximum memory threshold allocated, which causes node based DU as LWIO will stop handling SMB connection requests.
So The reason our synciq sessions were taking 30-40X longer to complete was because we had subfolder quotas on the target directories. Once we removed the quotas it sped up and is taking only minutes to complete again.