Unsolved
This post is more than 5 years old
11 Posts
0
4043
Cluster stand up issues
We received a 3 Node NL400. Powered on first node, let it come up and was able to completely configure the node and create a cluster. Powered on a second node, let it come up, and configured it to add to the newly created cluster. During this process, it was discovered there was a bad IB cable on the second node, and we powered it off with the power button on the back of the node. When the second node rebooted, it didn't grab an IP address, and the devices (disk drive) show a status of [NEW]. It appears the second node is stuck in this situation, and I can't add the third node to the cluster, because there is no quorum. I will add screen shots/output. Am looking for solutions to this that will not require a re-image of the nodes, as it is a remote site.
The_DI
11 Posts
0
June 22nd, 2016 13:00
Here is the recent output:
login as: root
Using keyboard-interactive authentication.
Password:
*** Warning: Auth Service is Unavailable ***
Last login: Mon May 2 06:41:33 2016 from xx.xx.xx.xxx
Copyright (c) 2001-2014 EMC Corporation. All Rights Reserved.
Copyright (c) 1992-2011 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
The Regents of the University of California. All rights reserved.
Isilon OneFS v7.2.1.2
ats-isilon2-1# isi stat
Warning: Cluster does not have quorum
Cluster Name: ats-isilon2
Cluster Health: [ ATTN]
Cluster Storage: HDD SSD Storage
Size: n/a (n/a Raw) n/a (n/a Raw)
VHS Size: 0
Used: n/a (n/a) n/a (n/a)
Avail: n/a (n/a) 0 (n/a)
Health Throughput (bps) HDD Storage SSD Storage
ID |IP Address |DASR | In Out Total| Used / Size |Used / Size
-------------------+-----+-----+-----+-----+-----------------+-----------------
1|xx.xx.xx.xxx |-A-- | 0| 0| 0| 65M/ n/a( n/a)| 0/ n/a( n/a)
2|None |D--- | 0| 0| 0| n/a/ n/a( n/a)| n/a/ n/a( n/a)
-------------------+-----+-----+-----+-----+-----------------+-----------------
Cluster Totals: | 0| 0| 0| n/a/ n/a( n/a)| n/a/ n/a( n/a)
Health Fields: D = Down, A = Attention, S = Smartfailed, R = Read-Only
Critical Events:
Event DB temporarily unavailable. Retry in 30 seconds.
Cluster Job Status:
Job status temporarily unavailable.
ats-isilon2-1#
ats-isilon2-1#
ats-isilon2-1#
ats-isilon2-1# isi_for_array isi device
ats-isilon2-2: Node 2, [DOWN]
ats-isilon2-2: Bay 1 Lnum N/A [NEW] SN:PN2331PAK0NNJT /dev/da1
ats-isilon2-2: Bay 2 Lnum N/A [NEW] SN:PN2331PAK02MYT /dev/da2
ats-isilon2-2: Bay 3 Lnum N/A [NEW] SN:PN2331PAJZ9AYT /dev/da19
. . .
ats-isilon2-2: Bay 34 Lnum N/A [NEW] SN:PN2331PAK0JHMT /dev/da16
ats-isilon2-2: Bay 35 Lnum N/A [NEW] SN:PN2331PAK0JH5T /dev/da17
ats-isilon2-2: Bay 36 Lnum N/A [NEW] SN:PN2331PAJZZLZT /dev/da18
ats-isilon2-1: Node 1, [ATTN]
ats-isilon2-1: Bay 1 Lnum 35 [HEALTHY] SN:PN2334PBH1NKMR /dev/da1
ats-isilon2-1: Bay 2 Lnum 34 [HEALTHY] SN:PN2334PBH7ZYVR /dev/da2
ats-isilon2-1: Bay 3 Lnum 17 [HEALTHY] SN:PN2334PBGDY5ET /dev/da19
. . .
ats-isilon2-1: Bay 34 Lnum 20 [HEALTHY] SN:PN2334PBH8PNWT /dev/da16
ats-isilon2-1: Bay 35 Lnum 19 [HEALTHY] SN:PN2334PBH7YBER /dev/da17
ats-isilon2-1: Bay 36 Lnum 18 [HEALTHY] SN:PN2334PBH8S2ET /dev/da18
ats-isilon2-1#
I am looking for a procedure to get Node 2 and its drives healthy.
Anonymous User
170 Posts
1
June 22nd, 2016 21:00
The fastest is probably to reformat node 2 especially since there's no data on it.
If you can connect to the serial port, sign on and do:
# isi_reformat_node
Hopefully the node just smartens up and comes up healthy. I've had nodes come up with a single drive NEW state and you can just add them with isi devices -d 2:1 -a add
The_DI
11 Posts
0
June 23rd, 2016 06:00
Thanks, Ed. I hope it smartens up, too. I will check with management to see if we can try to manually add the drives. But I think that is just a symptom of a bigger issue. As you can see, node 2 has no IP address, and therefore, I cannot log into that node individually. Perplexing.
The_DI
11 Posts
0
September 15th, 2016 08:00
We ended up re-imaging the nodes to OneFS v8.0. All is now well.