Cluster admin best practices.

Question

Greetings,

I'm curious what people are doing for admin server access to their clusters? We do a fair amount of work from the command line and will probably use cron and ssh access to run things. The 2 main things that have come up which we could work around is ssh keys, and shell history.

We've found that if we ssh to the cluster name, we end up on whatever node it returns. This means the cluster name key returns differently each time we try until it loops back to the node that matches the key on file. We can get around this by just going directly to one node by name, but everything breaks if that node is down. It would be nice to have a more unique name plumbed to the cluster that would float regardless of which node went down. I assume this is possible.

The shell history seems to be unique to each node also. We could get around this by trying to map the history files to an ifs location that should be accessible by all nodes. I assume there would not be any issues if multiple people were working on multiple nodes.

I'd like to know how others handle these issues.

Thanks,

Jeff

AndrewChung · Answer

If you want an address/node that is guaranteed to be up use the SmartConnect service IP. Many times you will have a DNS name for the SIP, so you can use that. It will by default be node 1, but will migrate to another node if node 1 goes down.

jeffc wrote:
Greetings,
I'm curious what people are doing for admin server access to their clusters? We do a fair amount of work from the command line and will probably use cron and ssh access to run things. The 2 main things that have come up which we could work around is ssh keys, and shell history.
We've found that if we ssh to the cluster name, we end up on whatever node it returns. This means the cluster name key returns differently each time we try until it loops back to the node that matches the key on file. We can get around this by just going directly to one node by name, but everything breaks if that node is down. It would be nice to have a more unique name plumbed to the cluster that would float regardless of which node went down. I assume this is possible.
The shell history seems to be unique to each node also. We could get around this by trying to map the history files to an ifs location that should be accessible by all nodes. I assume there would not be any issues if multiple people were working on multiple nodes.
I'd like to know how others handle these issues.
Thanks,
Jeff

dynamox · Answer

when it migrates to another node, won't it ask to accept the ssh key again ?

jeffc2 · Answer

It won't ask to accept the key if it already has one for that host. If we want to ssh to "isilon", the ssh key that is being returned by the given node will vary.

As Andrew points out, we can put keys in the files for each node so they are all there. That would get around the key issue.

It still doesn't get us around shell history files on various nodes and not using a common one. It's time for me to start looking into changing the root environment to log history to the /ifs file system :-)

Thanks,

Jeff

AndrewChung · Answer

Yes you would get hit with another SSH key. However, you can usually store a few name/IP paris for a given SSH host. The actual rule for movement of the SIP is that it will migrate to the next lowest node ID. Not the logical node number, the friendly number that you see on the screen. The node ID is a monotonically increasing number from 1 that is assigned to a node in the order it was added to a cluster. So if you have a 5 node cluster like so:

LNN Node 1 -> Node ID 1

LNN Node 2 -> Node ID 2

LNN Node 3 -> Node ID 3

LNN Node 4 -> Node ID 4

LNN Node 5 -> Node ID 5

Then say you smart fail out Node 1. Then you add a new replacement node you would have:

LNN Node 1 -> Node ID 6

LNN Node 2 -> Node ID 2

LNN Node 3 -> Node ID 3

LNN Node 4 -> Node ID 4

LNN Node 5 -> Node ID 5

The SIP lives on the lowest Node ID that is functional in the cluster. In this case, it would live on Node 2 all the time unless node 2 goes down.

Command to find out the translation:

isi_nodes "LNN: %{lnn} => Node ID: %{id}"

If you jsut do an isi_nodes you will get help. Just use the correct replacement strings in the output you want.

AndrewChung · Answer

Are you looking at the shell history for some sort of auditing?  Why do you need the shell history to be identical on all the nodes?

jeffc2 · Answer

We would like it partly for auditing, but mostly for the 'what options did I use for that command?' look back.  We're new to Isilon and are working our way through it.  Its nice to see what we did before or not have to type in a long command.  As we end up on different nodes, one may not have the history file where we did what we are trying to do now.

Isilon

Cluster admin best practices.

Was this post helpful?