Unsolved
This post is more than 5 years old
2 Posts
0
678
October 12th, 2015 13:00
Weird issue with 7.1.1.4 Isilon Simulator
Two nodes bailed out of the cluster (nodes 2 and 5), but when I log into either node 2 or 5, they show themselves up but nodes 1,3-4, and 6 down. The larger group of 4 nodes have /ifs mounted and the group of 2 nodes does not. Tried individual node reboots and a cluster restart to see if they would straighten themselves out, but no luck.
FROM NODE1:
vmlxisilon-1# isi status
Cluster Name: vmlxisilon
Cluster Health: [ ATTN]
Cluster Storage: HDD SSD Storage
Size: 23G (45G Raw) 0 (0 Raw)
VHS Size: 23G
Used: 17G (77%) 0 (n/a)
Avail: 5.3G (23%) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR | In Out Total| Used / Size |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
1|10.168.50.120 | OK | 0| 24| 24| 4.3G/ 5.7G( 76%)|(No Storage SSDs)
2|10.168.50.121 |D--- | n/a| n/a| n/a| n/a/ n/a( n/a)| n/a/ n/a( n/a)
3|10.168.50.122 | OK | 153K| 24| 153K| 4.3G/ 5.7G( 77%)|(No Storage SSDs)
4|10.168.50.123 | OK | 0| 32| 32| 4.3G/ 5.7G( 77%)|(No Storage SSDs)
5|10.168.50.124 |D--- | n/a| n/a| n/a| n/a/ n/a( n/a)| n/a/ n/a( n/a)
6|10.168.50.125 | OK | 0| 24| 24| 4.3G/ 5.7G( 76%)|(No Storage SSDs)
-------------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals: | 153K| 104| 153K| 17G/ 23G( 77%)|(No Storage SSDs)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
FROM NODE 2:
vmlxisilon-2# isi status
Cluster Name: vmlxisilon
Cluster Health: [ ATTN]
Cluster Storage: HDD SSD Storage
Size: n/a (n/a Raw) n/a (n/a Raw)
VHS Size: 0
Used: n/a (n/a) n/a (n/a)
Avail: n/a (n/a) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR | In Out Total| Used / Size |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
1|10.168.50.120 |D--- | n/a| n/a| n/a| n/a/ n/a( n/a)| n/a/ n/a( n/a)
2|10.168.50.121 | OK | 0| 0| 0| 4.2G/ n/a( n/a)| 0/ n/a( n/a)
3|10.168.50.122 |D--- | n/a| n/a| n/a| n/a/ n/a( n/a)| n/a/ n/a( n/a)
4|10.168.50.123 |D--- | n/a| n/a| n/a| n/a/ n/a( n/a)| n/a/ n/a( n/a)
5|10.168.50.124 | OK | 0| 0| 0| 4.2G/ n/a( n/a)| 0/ n/a( n/a)
6|10.168.50.125 |D--- | n/a| n/a| n/a| n/a/ n/a( n/a)| n/a/ n/a( n/a)
-------------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals: | 0| 0| 0| n/a/ n/a( n/a)| n/a/ n/a( n/a)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
Any ideas?
0 events found


Peter_Sero
6 Operator
•
1.2K Posts
0
October 12th, 2015 20:00
Looks like the internal network has been split, e.g. nodes 2 and 5 moved to another host, or LAN segment.
Can you check this on VM level?
On Isilon level, what is the output of:
# isi_eth_mixer_d showlayout
-- Peter
Kona2000
2 Posts
0
October 13th, 2015 06:00
Peter-
You were absolutely correct - nodes 2 and 5 had been moved on accident to another VM host. Moved them back, manually purged the CELog, and I'm back in business. Thank you for the tip.
Cluster Name: vmlxisilon
Cluster Health: [ OK ]
Cluster Storage: HDD SSD Storage
Size: 39G (68G Raw) 0 (0 Raw)
VHS Size: 29G
Used: 27G (69%) 0 (n/a)
Avail: 12G (31%) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR | In Out Total| Used / Size |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
1|10.168.50.120 | OK | 428| 164K| 165K| 1.5G/ 6.4G( 23%)|(No Storage SSDs)
2|10.168.50.121 | OK | 0| 33| 33| 4.4G/ 6.4G( 69%)|(No Storage SSDs)
3|10.168.50.122 | OK | 706K| 509| 706K| 4.4G/ 6.4G( 69%)|(No Storage SSDs)
4|10.168.50.123 | OK | 285| 0| 285| 4.4G/ 6.4G( 69%)|(No Storage SSDs)
5|10.168.50.124 | OK | 214| 33| 247| 7.4G/ 6.4G(> 99%)|(No Storage SSDs)
6|10.168.50.125 | OK | 214| 496| 710| 4.4G/ 6.4G( 69%)|(No Storage SSDs)
-------------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals: | 707K| 165K| 872K| 27G/ 39G( 69%)|(No Storage SSDs)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
Critical Events:
Cluster Job Status:
No running jobs.
No paused or waiting jobs.
No failed jobs.
Recent job results:
Time Job Event
--------------- -------------------------- ------------------------------