Just statistics. FSanalyze is running correctly at 22:00 every day. The basic cluster info shows up, but no connectivity data, no file info, nothing that would be sort of real time.
It's still an issue with 3.1. I have an open SR on it but I'm not seeing us getting any closer to resolution yet.
IIQ 3.1 is significantly worse than 3.0 for me. The upgrade alone took almost 12 hours from beginning to end by the time the 4 datastores were upgraded.
Other VM's competing for compute and network resources? I suspect a networking bandwidth issue; i.e. several VM's competing for network traffic over a single interface.
For multiple clusters, I have seen better performance from dedicated hardware running InsightIQ.
We have four clusters, all have less than 8 nodes. I doubt that VM resource contention is a problem, but I don’t have rights to see that sort of thing.
Is it worse in how often you see this? If so, how often is that?
The upgrade being slow had to do with a need to look through the entire DB looking for pre-3.0 data, which stored data with a very different time samples. If detected, the upgrade script then gives the user the choice to upgrade that older collected data, drop the old data or cancel the upgrade.
I have rights to my VMware infrastructure (I have my VCP and RHCE certs). It's not a VM issue - we've got lots of CPU, memory, and network. CPU and memory are under 20% utilization. Similarly, the hosts and network segments have LOTS of headroom.
This was working FINE with IIQ 3.0 and immediately following the 3.1 upgrade, it's been misbehaving badly. I do have an SR, and the analyst so far is stumped.
I've restarted the app many times but I've had hundreds of instances of it suspending (Supended [sic] monitoring) in my logs. This is not WAN-based monitoring - I'm in the same datacenter and pings average 0.145ms.
$ grep -ic upend cluster.log*
cluster.log:68
cluster.log.1:309
cluster.log.2:310
cluster.log.3:308
This is for just one cluster. The peak on the other 3 clusters is 11 across all versions of the cluster.log files. That's a far cry from a thousand.
The main issues I had with the upgrade taking so long are:
1. The doc says it should take about an hour
2. The upgraded failed and required manual intervention a half-dozen times. If it would have been automatic from beginning to end, I wouldn't have to sit there and monitor it. Not only that, but when it failed, it would restart from the beginning and could take a half-hour before it even got to the same point as it was before it failed. Some intelligent checkpointing should have been implemented.
Anonymous User
170 Posts
1
October 17th, 2014 10:00
There are multiple reasons for this. Sometimes you'll see the answer in /var/log/insightiq.log
Steps to try:
dynamox
9 Legend
•
20.4K Posts
0
October 17th, 2014 11:00
any data or just file system statistics, if it's just file system statistics make sure FSAnalyze job is running/completing.
david_knapp
1 Rookie
•
122 Posts
0
October 17th, 2014 13:00
Just statistics. FSanalyze is running correctly at 22:00 every day. The basic cluster info shows up, but no connectivity data, no file info, nothing that would be sort of real time.
david_knapp
1 Rookie
•
122 Posts
0
October 17th, 2014 14:00
I don't have the root password to get into the VM that is running Insight IQ so cannot see what services are running.
I will try to have the whole thing rebooted.
chjatwork
2 Intern
•
356 Posts
0
November 18th, 2014 03:00
Hey there been a resolution for this? I have v3.1 and I still get this issue with some of my clusters.
Anonymous User
170 Posts
0
November 18th, 2014 06:00
It's still an issue with 3.1. I have an open SR on it but I'm not seeing us getting any closer to resolution yet.
IIQ 3.1 is significantly worse than 3.0 for me. The upgrade alone took almost 12 hours from beginning to end by the time the 4 datastores were upgraded.
david_knapp
1 Rookie
•
122 Posts
0
November 18th, 2014 08:00
I had to reboot the Insight server.
David Knapp
Quest Diagnostics Nichols Institute
Sent from Samsung Note II
chjatwork
2 Intern
•
356 Posts
0
November 18th, 2014 08:00
Davek,
are you using the VM appliance?
david_knapp
1 Rookie
•
122 Posts
0
November 19th, 2014 06:00
Yes.
David Knapp
mattashton1
93 Posts
0
November 19th, 2014 10:00
Other VM's competing for compute and network resources? I suspect a networking bandwidth issue; i.e. several VM's competing for network traffic over a single interface.
For multiple clusters, I have seen better performance from dedicated hardware running InsightIQ.
How many nodes per cluster?
IIQ 3.1 is out.
Cheers,
Matt
david_knapp
1 Rookie
•
122 Posts
0
November 19th, 2014 10:00
We have four clusters, all have less than 8 nodes. I doubt that VM resource contention is a problem, but I don’t have rights to see that sort of thing.
David
mattashton1
93 Posts
0
November 19th, 2014 11:00
WAN latency issues?
I would login to the IIQ VM and verify connectivity to each of the clusters as well as the NFS mount for the FSA data.
If you can login to each of the clusters from the VM, that would be a good test as well...
I assume you restarted the VM?
Under Settings, do all of the clusters show as green for Monitored Clusters? (Assuming you are using at least 3.0)
If so, suspend and resume them...
Have you opened an SR?
Cheers,
Matt
osaddict
110 Posts
0
November 20th, 2014 08:00
Is it worse in how often you see this? If so, how often is that?
The upgrade being slow had to do with a need to look through the entire DB looking for pre-3.0 data, which stored data with a very different time samples. If detected, the upgrade script then gives the user the choice to upgrade that older collected data, drop the old data or cancel the upgrade.
Anonymous User
170 Posts
1
November 20th, 2014 08:00
I have rights to my VMware infrastructure (I have my VCP and RHCE certs). It's not a VM issue - we've got lots of CPU, memory, and network. CPU and memory are under 20% utilization. Similarly, the hosts and network segments have LOTS of headroom.
This was working FINE with IIQ 3.0 and immediately following the 3.1 upgrade, it's been misbehaving badly. I do have an SR, and the analyst so far is stumped.
I've restarted the app many times but I've had hundreds of instances of it suspending (Supended [sic] monitoring) in my logs. This is not WAN-based monitoring - I'm in the same datacenter and pings average 0.145ms.
$ grep -ic upend cluster.log*
cluster.log:68
cluster.log.1:309
cluster.log.2:310
cluster.log.3:308
This is for just one cluster. The peak on the other 3 clusters is 11 across all versions of the cluster.log files. That's a far cry from a thousand.
The main issues I had with the upgrade taking so long are:
1. The doc says it should take about an hour
2. The upgraded failed and required manual intervention a half-dozen times. If it would have been automatic from beginning to end, I wouldn't have to sit there and monitor it. Not only that, but when it failed, it would restart from the beginning and could take a half-hour before it even got to the same point as it was before it failed. Some intelligent checkpointing should have been implemented.