Unsolved

This post is more than 5 years old

3 Posts

4216

August 30th, 2012 17:00

Storage node offline

I have a storage node showing offline - how do I put this back online from the command line?

admin@dcy-mis-ava01:~/>: status.dpn

Thu Aug 30 17:53:01 MDT 2012  [DCY-MIS-AVA01.TH.EPCHD.ORG] Thu Aug 30 23:53:01 2012 UTC (Initialized Wed Aug 29 14:40:04 2012 UTC)

Node   IP Address     Version   State   Runlevel  Srvr+Root+User Dis Suspend Load UsedMB Errlen  %Full   Percent Full and Stripe Status by Disk

0.4                            OFFLINE                                          0    0            0.0%

0.0   192.168.255.2  6.1.0-402  ONLINE fullaccess mhpu+0hpu+0hpu   2 false   0.10 35860   197765   2.8%   2%(onl:249)  2%(onl:229)  2%(onl:223)

0.1   192.168.255.3  6.1.0-402  ONLINE fullaccess mhpu+0hpu+0hpu   1 false   0.41 35884   202367   3.1%   3%(onl:249)  3%(onl:250)  3%(onl:250)

0.2   192.168.255.4  6.1.0-402  ONLINE fullaccess mhpu+0hpu+0hpu   1 false   0.12 36089   225591   2.9%   2%(onl:242)  2%(onl:237)  2%(onl:248)

0.3   192.168.255.5  6.1.0-402  ONLINE fullaccess mhpu+0hpu+0hpu   1 false   0.10 35255   312768   3.1%   3%(onl:251)  3%(onl:261)  3%(onl:248)

0.5   192.168.255.7  6.1.0-402  ONLINE fullaccess mhpu+0hpu+0hpu   1 false   0.07 35572   214596   3.0%   3%(onl:246)  3%(onl:250)  3%(onl:251)

0.6   192.168.255.8  6.1.0-402  ONLINE fullaccess mhpu+0hpu+0hpu   1 false   2.08 35162   193127   3.1%   3%(onl:245)  3%(onl:266)  3%(onl:239)

Srvr+Root+User Modes = migrate + hfswriteable + persistwriteable + useraccntwriteable

August 31st, 2012 00:00

It wouldn't be wise to restart an offline node or 'force' it to come back online without first identifying what caused it to go offline.

Possible causes include but are not limited to 

1) hardware failure

2) OS issues

3) Environmental issues (i.e. power problems)

4) network connectivity

5) time synchronisation

6) Issue with the Avamar GSAN or software. 

If the node is still reachable by SSH you can check for hardware, network, time sync and some other OS related problems by checking the /var/log/messages file

To determine whether Avamar nodes are time synchronised, check the current time and date of each node on the Avamar system.

As the 'admin' user, load the SSH keys and run

mapall --all --parallel '/bin/date'

The '--parallel' flag executes the command on each node simultaneously. On a system where time is synchronised you will see output similar to the following:

admin@avmtest1:~/>: mapall --all --parallel 'date'

Using /usr/local/avamar/var/probe.xml

(0.s) ssh  -x  admin@xx.xx.xx.xxx 'date'

(0.0) ssh  -x  admin@xx.xx.xx.xxx 'date'

(0.1) ssh  -x  admin@xx.xx.xx.xxx 'date'

(0.2) ssh  -x  admin@xx.xx.xx.xxx 'date'

Mon Jun 20 12:01:12 BST 2011

Mon Jun 20 11:01:12 UTC 2011

Mon Jun 20 11:01:12 UTC 2011

Mon Jun 20 11:01:12 UTC 2011

The date and time is fully synchronised between all the nodes on this system.  Note that the utility node is set to the local time zone, in this example 'BST' whereas the data nodes are set to the 'UTC' timezone.  This is normal and expected behavior.

Hopefully this will help you narrow down the cause of the problem.  At that point it would be recommended to engage the support team and tell them of your findings so that they can help confirm the root cause of the failure and actually bring the node back online in the most appropriate way.

No Events found!

Top