Unable to boot Linux servers when failed over
Symptoms
Running "emcpbfs_linux config" script to configure PowerPath to boot from the R1 script and tried booting from the new grub entry that it created and it booted fine as expected. However, when failing over it does not work. Powerpath is not detecting the devices when running at site R2 , and therefore the LVM filter in the ramdisk image which is set to only accept emcpower devices doesn't find any of the PVs, causing the server to not be able to boot
Running the script with a 20-second sleep in it the server actually booted up without problems when it had been failed over (and with the same pseudo-device names)
Running the script without the sleep exhibited the same symptoms of looping trying to find the volume group and the server didn't boot up.
Cause
When script was run without sleep: the first time the FC link is still not up and the native SCSI devices are still not configured on the host.
But there is one internal disk (/dev/sda) which is already configured by this time. So the script assumes the native boot devices are already configured and continues to try and create the emcpower device. But since the R2 devices are still not up, this fails.
Basically from the script the below if condition succeeds because sdfound is set to 1 even though the boot device is not present.
if [ ${count1} -eq ${count2} ] && [ "${sdfound}" -eq 1 ]; then /etc/opt/emcpower/emcpmgr map -p
<snip of log without sleep>
COUNT: getsddevs return 1
COUNT1: getsddevs return 1
COUNT: getsddevs return 1
COUNT2: getsddevs return 1
/dev/sda
/dev/emcpower
<snip>
<snip of log with sleep>
COUNT: getsddevs return 25
COUNT1: getsddevs return 25
COUNT: getsddevs return 25
COUNT2: getsddevs return 25
/dev/sda /dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq /dev/sdr /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy
sdc: unknown partition table
sde: unknown partition table
sdf: unknown partition table
sdg: unknown partition table
sdo: unknown partition table
sdq: unknown partition table
sdr: unknown partition table
emcpowera:
emcpowera1 emcpowera2
/dev/emcpower
/dev/emcpowera
/dev/emcpowera1
/dev/emcpowera2
<snip>
Resolution
Workaround:
Disabling this internal disk from BIOS during boot and then try the failover (without the sleep).
With this as well, the boot should succeed without any issues as the script has a check to ensure that the count is greater than 0 when it tries to create the emcpower device.
Permanent fix :
there's no permanent fix as R1/R2 boot with an internal disk is not a recommended setup. The complexity is because the script will not be able to identify if the boot disk is already configured or not. And this script is not specific to R1/R2 boot scenarios. The same script needs to work on normal Boot from SAN setups as well. Adding sleep when not required would not be a very good idea.
Engineering will document that this config is not supported in Powerpath installation and administration guide.