I'm not sure if your upgrade has finished or not at this point, but in my experience, that update process counts in Microsoft minutes and usually completes much faster. How is your 3.2 experience going so far?
I've seen it before. Best answer I had was to move from a virtual IIQ to a physical one. Sometimes completely rebuilding it is the only way out of this. Make sure you read the docs closely for how to migrate the iiq data.
InsightIQ 3.1.0 and InsightIQ 3.1.1 suffer from a couple different statistic collection issues that Support would be able to help you with.
In IIQ 3.1.0: there was an issue with a debug string being created in the downsampler that cause intermittent delays in other processes (i.e. the factfetcher); we also saw issues with disk statistics and file heat data causing some issues with API connection to your cluster. All of the issues mentioned here are addressed in code changes that can be applied to your machine.
In IIQ 3.1.1: We saw issues with disk statistics and file heat data being too large to process. This would cause other datasets to become temporarily delayed. These issues are also addressed in code changes that can be applied to your machine.
To engage support, please create a service request. To do so you have a few options:
Yes, I had looked at the SR, and I apologize for not replying sooner.
I see we reset the celog databases on December 8th. The Subject Matter Expert currently working on your SR with the assigned TSE is still seeing some errors; what I am seeing is a lack of logging, which is just as concerning. Celog can be quite finicky, and for those of us who have used Isilon for a long while, we know things often get blamed on it. I would give those services one more restart (leave the DB alone, it is fine) just to make sure it's still working properly.
The longest running API calls that I see on your cluster are:
These are going to be on your cluster in /var/log/apache2/webui_httpd_access.log. The times highlighted in red are the time the request took in milliseconds. The default timeout for IIQ is 20 seconds. (I did edit the log lines here as to not reveal anything specific about your cluster.) These stats are for tracking Filesystem Events in performance reporting. These reports (according to the IIQ 3.2 User Guide) do not affect the Data Retrieval Delay condition, so I would consider an upgrade to IIQ 3.2 (if your release version in the SR is correct).
Let me know if there is anything else I can look in to! As for your SR, I can vouch for the fact that it is in good hands with my Subject Matter Expert.
Yes, IIQ 3.2 is compatible with all supported versions of OneFS. There is one small caveat to "adding" (or resuming post upgrade) 7.1.1 clusters in 3.2.x which may require a code change from support. Information about this can be found in the following KB:
I am upgrading to 3.2 since we could not figure why we get the delays.
I am running it now and I see it needs to upgrade the datastore, it says "The current InsightIQ datastore is not compatible with InsightIQ 3.2" . The datastore is on NFS on one of the clusters and it is saying it will take 2 days to upgrade: "Estimated time required for upgrade: 2 days, 7 hours, 19 minutes, 18 seconds" is that normal? I check the size and it is only 1TB:
# pwd
/ifs/data/Isilon_Support/insightiq
#
# du -sh .
1.0T .
#
I thought 2 days is too long? And are we going to miss data collection for that long?
johnsonka
130 Posts
0
January 11th, 2016 11:00
Hello Daniel,
I'm not sure if your upgrade has finished or not at this point, but in my experience, that update process counts in Microsoft minutes and usually completes much faster. How is your 3.2 experience going so far?
carlilek
2 Intern
•
205 Posts
0
December 8th, 2015 07:00
I've seen it before. Best answer I had was to move from a virtual IIQ to a physical one. Sometimes completely rebuilding it is the only way out of this. Make sure you read the docs closely for how to migrate the iiq data.
johnsonka
130 Posts
1
December 8th, 2015 10:00
Hello,
InsightIQ 3.1.0 and InsightIQ 3.1.1 suffer from a couple different statistic collection issues that Support would be able to help you with.
In IIQ 3.1.0: there was an issue with a debug string being created in the downsampler that cause intermittent delays in other processes (i.e. the factfetcher); we also saw issues with disk statistics and file heat data causing some issues with API connection to your cluster. All of the issues mentioned here are addressed in code changes that can be applied to your machine.
In IIQ 3.1.1: We saw issues with disk statistics and file heat data being too large to process. This would cause other datasets to become temporarily delayed. These issues are also addressed in code changes that can be applied to your machine.
To engage support, please create a service request. To do so you have a few options:
1. Log in to your online account on support.emc.com and go to this page: https://support.emc.com/servicecenter/createSR
2. Engage an Isilon Support engineer directly through Live Chat Support: https://support.emc.com/servicecenter/liveChat/
3. Call in to EMC Isilon Support at 1-800-782-4362 (For a complete local country dial list, please see this document: http://www.emc.com/collateral/contact-us/h4165-csc-phonelist-ho.pdf)
carlilek
2 Intern
•
205 Posts
0
December 9th, 2015 07:00
How big are your clusters? (node count, total used space, # of objects)
Dtek1
1 Rookie
•
79 Posts
0
December 9th, 2015 07:00
Thank you @carlilek! I hope I don't have to do that..-:)
Katie, Thank you! I have SR open. We are trying different things but still no luck.
Dtek1
1 Rookie
•
79 Posts
0
December 9th, 2015 14:00
2 , 14 node clusters, mainly NL nodes, ~1.8PB in space each, ~40% used.
SR has now been escalated to the IIQ team.
johnsonka
130 Posts
0
December 10th, 2015 05:00
Hello!
Do you mind letting me know your SR number? I would love to help in any way that I can!
Dtek1
1 Rookie
•
79 Posts
0
December 10th, 2015 07:00
75808638. Thank you Katie! I appreciate it.
Dtek1
1 Rookie
•
79 Posts
0
December 16th, 2015 10:00
Katie, did you get a chance to take a look at the SR?
johnsonka
130 Posts
0
December 16th, 2015 12:00
Hello Daniel,
Yes, I had looked at the SR, and I apologize for not replying sooner.
I see we reset the celog databases on December 8th. The Subject Matter Expert currently working on your SR with the assigned TSE is still seeing some errors; what I am seeing is a lack of logging, which is just as concerning. Celog can be quite finicky, and for those of us who have used Isilon for a long while, we know things often get blamed on it. I would give those services one more restart (leave the DB alone, it is fine) just to make sure it's still working properly.
The longest running API calls that I see on your cluster are:
These are going to be on your cluster in /var/log/apache2/webui_httpd_access.log. The times highlighted in red are the time the request took in milliseconds. The default timeout for IIQ is 20 seconds. (I did edit the log lines here as to not reveal anything specific about your cluster.) These stats are for tracking Filesystem Events in performance reporting. These reports (according to the IIQ 3.2 User Guide) do not affect the Data Retrieval Delay condition, so I would consider an upgrade to IIQ 3.2 (if your release version in the SR is correct).
Let me know if there is anything else I can look in to! As for your SR, I can vouch for the fact that it is in good hands with my Subject Matter Expert.
Dtek1
1 Rookie
•
79 Posts
0
December 17th, 2015 08:00
Thank you Katie.
is IIQ 3.2 compatible with OneFS 7.1.1.2?
-D
johnsonka
130 Posts
1
December 17th, 2015 08:00
Yes, IIQ 3.2 is compatible with all supported versions of OneFS. There is one small caveat to "adding" (or resuming post upgrade) 7.1.1 clusters in 3.2.x which may require a code change from support. Information about this can be found in the following KB:
https://support.emc.com/kb/211058
[Requires a login to EMC Online Support]
Please let me know if you have any other questions!
Dtek1
1 Rookie
•
79 Posts
0
December 22nd, 2015 07:00
Thank you Katie. Still no information from support. Very disappointed by the lack of response we are getting about this.
Dtek1
1 Rookie
•
79 Posts
0
January 8th, 2016 06:00
Katie,
I am upgrading to 3.2 since we could not figure why we get the delays.
I am running it now and I see it needs to upgrade the datastore, it says "The current InsightIQ datastore is not compatible with InsightIQ 3.2" . The datastore is on NFS on one of the clusters and it is saying it will take 2 days to upgrade: "Estimated time required for upgrade: 2 days, 7 hours, 19 minutes, 18 seconds" is that normal? I check the size and it is only 1TB:
# pwd
/ifs/data/Isilon_Support/insightiq
#
# du -sh .
1.0T .
#
I thought 2 days is too long? And are we going to miss data collection for that long?
Thanks
-Daniel
Dtek1
1 Rookie
•
79 Posts
1
January 11th, 2016 11:00
Katie,
Upgrade went fine. We have not seen the delay errors since the upgrade. Too early to celebrate but looks promising.
Thanks
-Daniel