Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

2802

October 11th, 2015 17:00

InsightIQ 3.01 - Lost 2nd cluster - cant re-add

G'Day,

Just throwing this out there for any possible fix:

Our 2nd cluster had a NIC error and subsequent network issues, during which time the InsightIQ *lost* communication.

When I attempt to "Add Cluster" again, it states that:

"The cluster containing the node xxxx is already being monitored"

I've had to restart both the 2nd cluster and the InsightIQ a few times, but it still wont re add.

Any tips?

I know I could delete the datastore there and start it as a fresh monitor (the 2nd cluster is not really production yet) but I'm hoping that if I get a little guidance it might help if this occurs again.

Thanks!

__LEO__

130 Posts

October 12th, 2015 06:00

Hello wyszynski,

When a cluster *disappears*, I have generally seen this as an issue with the config.pickle file that lives in you cluster datastore. Unfortunately, due to the delicate nature of this file, we cannot address fixing it from Support. What I can offer is one solution that has been fully tested that may help you through this:

  1. Ensure your cluster datastore still exists
    1. In your datastore location, your datastores will be names 4_ _ . You can find your cluster GUID on the OneFS Web Administration Interface in the "About this Cluster" page.
      1. If you do not know your datastore location, you can find this on the IIQ VM or box by running:
        1. $ cat /var/cache/insightiq/datastore.pickle
  2. If your datastore exists:
    1. Move the datastore for just this ONE cluster to a new location (i.e. move it to an Isilon cluster NFS export for backup)
    2. Restart monitoring your cluster
    3. Once that has been done, wait 1-2 days and then merge in your old data.
      1. Stop the insightiq service: $ sudo service insightiq stop
      2. You can copy all files that have a name like *.summary.* from the old datastore (the one you backed up) back to the new datastore (the one that was created when you restarted monitoring your cluster)
      3. Restart the insightiq service: $ sudo service insightiq start
    4. Verify all of your historical data is present. There will more than likely be a gap at this point since you have been looking at fixing this for more time than the cluster caches for performance data.

Since this is a complicated process, I would recommend creating an SR with Support to address the datastore merge. To create a service request, you have a couple options:

1. Log in to your online account on support.emc.com and go to this page: https://support.emc.com/servicecenter/createSR

2. Call in to EMC Isilon Support at 1-800-782-4362 (For a complete local country dial list, please see this document: http://www.emc.com/collateral/contact-us/h4165-csc-phonelist-ho.pdf)

  1. If your datastore does not exist:
    1. The data is not present in the IIQ system. I would create an SR with Support to address this concern.

Please let us know if you run in to any trouble! We're more than happy to help.

3 Apprentice

 • 

592 Posts

October 11th, 2015 22:00

Don't restart the IIQ vm, try logging into IIQ vm via CLI as root or administrator and run "iiq_restart". Hopefully it will still see it and you won't have to re-add it. Also try to use IPs instead of smartconnect cluster name.

34 Posts

October 11th, 2015 23:00

G'Day Phil,

Thanks for checking and answering - unfortunately no....

Tried the iiq_restart already - no luck.

I didn't try re-adding the cluster as IP though, but no....it still states that its already being monitored.

The attempts still continue..

Thanks though!!

__LEO__

205 Posts

October 12th, 2015 05:00

How big are your clusters (node count/total size)? We found that after a point, the IIQ VM simply couldn't handle the size of our clusters and would pull little games like this frequently. When we moved to a physical box, it was able to handle everything I've thrown at it (currently 3 clusters with ~100 nodes total). I'd also recommend going to a newer version of IIQ, which generally behave better.

34 Posts

October 12th, 2015 16:00

G'Day Katie,

Ahhh - this sounds promising. I've booked some time this arvo to attempt this fix.

I'll post the results!

*FINGERS CROSSED*

34 Posts

October 12th, 2015 16:00

G'Day carlilek,

Ahhh - interesting thought - but we've only got a 15x & this *missing* site is a 4x Node....makes sense if they are too big.

Thx for the suggestion!

I'm thinking of upg to 3.1 too get up to spec too - but didnt want to yet while things were playing up.

34 Posts

October 15th, 2015 18:00

UPDATE:

Firstly - many thanks to all whom reviewed and commented on my question!

In the end the CentOS VM was a little *upset* at recent network issues - not the Isilon's; where as both CLUSTERS then went missing.

Taking advice from Katie's suggestion, I did indeed move the old datastores and re-connect InsightIQ to the clusters perfectly fine.

I'll now let them run a spell and re-integrate the existing data files to the datastore.

All in all a win!!!!

Thanks all!

__LEO__

130 Posts

October 16th, 2015 06:00

wyszynski,

I'm glad that worked for you! If you need any help getting your datastores merged back together please let me know! Again, the only files you will need are the *.summary.* files, any raw data files will not be able to be downsampled by the new datastore and they will be included in the summary files at this point.

Please let me know if you have any additional questions or concerns!

No Events found!

Top