Unsolved

This post is more than 5 years old

2 Posts

678

October 12th, 2015 13:00

Weird issue with 7.1.1.4 Isilon Simulator

Two nodes bailed out of the cluster (nodes 2 and 5), but when I log into either node 2 or 5, they show themselves up but nodes 1,3-4, and 6 down.  The larger group of 4 nodes have /ifs mounted and the group of 2 nodes does not.   Tried individual node reboots and a cluster restart to see if they would straighten themselves out, but no luck.

FROM NODE1:

vmlxisilon-1# isi status

Cluster Name: vmlxisilon

Cluster Health:     [ ATTN]

Cluster Storage:  HDD                 SSD Storage

Size:             23G (45G Raw)       0 (0 Raw)

VHS Size:         23G

Used:             17G (77%)           0 (n/a)

Avail:            5.3G (23%)          0 (n/a)

                   Health  Throughput (bps)  HDD Storage      SSD Storage

ID |IP Address     |DASR |  In   Out  Total| Used / Size     |Used / Size

-------------------+-----+-----+-----+-----+-----------------+-----------------

  1|10.168.50.120  | OK  |    0|   24|   24| 4.3G/ 5.7G( 76%)|(No Storage SSDs)

  2|10.168.50.121  |D--- |  n/a|  n/a|  n/a|  n/a/  n/a( n/a)|  n/a/  n/a( n/a)

  3|10.168.50.122  | OK  | 153K|   24| 153K| 4.3G/ 5.7G( 77%)|(No Storage SSDs)

  4|10.168.50.123  | OK  |    0|   32|   32| 4.3G/ 5.7G( 77%)|(No Storage SSDs)

  5|10.168.50.124  |D--- |  n/a|  n/a|  n/a|  n/a/  n/a( n/a)|  n/a/  n/a( n/a)

  6|10.168.50.125  | OK  |    0|   24|   24| 4.3G/ 5.7G( 76%)|(No Storage SSDs)

-------------------+-----+-----+-----+-----+-----------------+-----------------

Cluster Totals:          | 153K|  104| 153K|  17G/  23G( 77%)|(No Storage SSDs)

Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

FROM NODE 2:

vmlxisilon-2# isi status

Cluster Name: vmlxisilon

Cluster Health:     [ ATTN]

Cluster Storage:  HDD                 SSD Storage

Size:             n/a (n/a Raw)       n/a (n/a Raw)

VHS Size:         0

Used:             n/a (n/a)           n/a (n/a)

Avail:            n/a (n/a)           0 (n/a)

                   Health  Throughput (bps)  HDD Storage      SSD Storage

ID |IP Address     |DASR |  In   Out  Total| Used / Size     |Used / Size

-------------------+-----+-----+-----+-----+-----------------+-----------------

  1|10.168.50.120  |D--- |  n/a|  n/a|  n/a|  n/a/  n/a( n/a)|  n/a/  n/a( n/a)

  2|10.168.50.121  | OK  |    0|    0|    0| 4.2G/  n/a( n/a)|    0/  n/a( n/a)

  3|10.168.50.122  |D--- |  n/a|  n/a|  n/a|  n/a/  n/a( n/a)|  n/a/  n/a( n/a)

  4|10.168.50.123  |D--- |  n/a|  n/a|  n/a|  n/a/  n/a( n/a)|  n/a/  n/a( n/a)

  5|10.168.50.124  | OK  |    0|    0|    0| 4.2G/  n/a( n/a)|    0/  n/a( n/a)

  6|10.168.50.125  |D--- |  n/a|  n/a|  n/a|  n/a/  n/a( n/a)|  n/a/  n/a( n/a)

-------------------+-----+-----+-----+-----+-----------------+-----------------

Cluster Totals:          |    0|    0|    0|  n/a/  n/a( n/a)|  n/a/  n/a( n/a)

     Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

Any ideas?

6 Operator

 • 

1.2K Posts

October 12th, 2015 20:00

Looks like the internal network has been split, e.g. nodes 2 and 5 moved to another host, or LAN segment.

Can you check this on VM level?

On Isilon level, what is the output of:

# isi_eth_mixer_d showlayout

-- Peter

2 Posts

October 13th, 2015 06:00

Peter-

You were absolutely correct - nodes 2 and 5 had been moved on accident to another VM host.  Moved them back, manually purged the CELog, and I'm back in business.  Thank you for the tip.

Cluster Name: vmlxisilon
Cluster Health:     [  OK ]
Cluster Storage:  HDD                 SSD Storage
Size:             39G (68G Raw)       0 (0 Raw)
VHS Size:         29G
Used:             27G (69%)           0 (n/a)
Avail:            12G (31%)           0 (n/a)

                   Health  Throughput (bps)  HDD Storage      SSD Storage
ID |IP Address     |DASR |  In   Out  Total| Used / Size     |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
  1|10.168.50.120  | OK  |  428| 164K| 165K| 1.5G/ 6.4G( 23%)|(No Storage SSDs)
  2|10.168.50.121  | OK  |    0|   33|   33| 4.4G/ 6.4G( 69%)|(No Storage SSDs)
  3|10.168.50.122  | OK  | 706K|  509| 706K| 4.4G/ 6.4G( 69%)|(No Storage SSDs)
  4|10.168.50.123  | OK  |  285|    0|  285| 4.4G/ 6.4G( 69%)|(No Storage SSDs)
  5|10.168.50.124  | OK  |  214|   33|  247| 7.4G/ 6.4G(> 99%)|(No Storage SSDs)
  6|10.168.50.125  | OK  |  214|  496|  710| 4.4G/ 6.4G( 69%)|(No Storage SSDs)
-------------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals:          | 707K| 165K| 872K|  27G/  39G( 69%)|(No Storage SSDs)

     Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only

Critical Events:


Cluster Job Status:

No running jobs.

No paused or waiting jobs.

No failed jobs.

Recent job results:
Time            Job                        Event
--------------- -------------------------- ------------------------------

0 events found

No Events found!

Top