Unsolved

This post is more than 5 years old

6 Posts

1666

October 9th, 2018 17:00

M1000e blade M910 PXE boot fail

I have a m1000e chassis v1.1 midplane with few M910 test blades, will add newer ones later. Each blade has 4 CPUs and 64GB RAM. I have a Mellanox QDR infiniband switch and Dell 6220 ethernet switch. I am setting it up for some MPI work and have Rocks clusters OS based on CentOS7 installed (RHEL 7.4 based). One of the blades within same chassis is the server while others will be compute. While each blade can load OS as front end and run without issues, I have not been able to PXE load the compute nodes off one server blade.

My public network is DHCP (IP-192.168.0.25, Subnet-255.255.255.0), private is set to static on front end blade, same subnet (IP-192.168.1.27, Subnet-255.255.255.0). With CentOS7.4 on hot swap hard drive plugged to a compute, I can ping the front end server blade, switches and vice versa successfully. However, the compute fails to boot consistently with same failure (attached OSCAR snapshot). I have tried both NIC options, same result. The iDRAC IP on computes are DHCP, should I set them to static on iDRAC that is on private network instead? If this is a network issue, what is the best way to check? I have tried BIOS and UEFI, same deal. Ultimately, if feasible, I would like to have the public network through Ethernet and QDR 40gb/s for compute node to front end communication.

The firmware updates are same on all blades, not latest as I lost remote Java console at iDRAC 3.85, rolled back to 3.65 on all.

I appreciate any help or pointer! This is my first post here, please let me know what else would help to troubleshoot.

Thank you. JoelBoot log computeBoot log compute

6 Posts

October 10th, 2018 05:00

Should the attached picture not be visible for some reason, the reason for PXE boot failure is:

PXE-E53: No boot filename received

PXE-MOF: Exiting Broadcom PXE ROM

No boot device available.

6 Posts

October 10th, 2018 09:00

I have a m1000e chassis v1.1 midplane with few M910 test blades, will add newer ones later. Each blade has 4 CPUs and 64GB RAM. I have a Mellanox QDR infiniband switch and Dell 6220 ethernet switch. I am setting it up for some MPI work and have Rocks clusters OS based on CentOS7 installed (RHEL 7.4 based). One of the blades within same chassis is the server while others will be compute. While each blade can load OS as front end and run without issues, I have not been able to PXE load the compute nodes off one server blade.

My public network is DHCP (IP-192.168.0.25, Subnet-255.255.255.0), private is set to static on front end blade, same subnet (IP-192.168.1.27, Subnet-255.255.255.0). With CentOS7.4 on hot swap hard drive plugged to a compute, I can ping the front end server blade, switches and vice versa successfully. However, the compute fails to boot consistently with same failure (attached OSCAR snapshot). I have tried both NIC options, same result. The iDRAC IP on computes are DHCP, should I set them to static on iDRAC that is on private network instead? If this is a network issue, what is the best way to check? I have tried BIOS and UEFI, same deal. Ultimately, if feasible, I would like to have the public network through Ethernet and QDR 40gb/s for compute node to front end communication.

The firmware updates are same on all blades, not latest as I lost remote Java console at iDRAC 3.85, rolled back to 3.65 on all.

I appreciate any help or pointer!

Thank you. Joel

Moderator

 • 

9.5K Posts

October 10th, 2018 10:00

Hi,

Is the PXE server on a blade or is it somewhere else on the network and is it in the same subnet? It looks like the servers are not reaching the pxe server to grab the image.

6 Posts

October 10th, 2018 11:00

Thank you, Josh. I have watched few of your videos on Dell support, appreciate the support!

Yes, server is a blade that is installed as front end on same chassis. Agree, seems like network. Are there any outputs I could paste that should give an idea on where it is broken on DHCP or static private network. I am using same subnet mask 255.255.255.0 on both networks.

 

Moderator

 • 

9.5K Posts

October 10th, 2018 11:00

Is the file name specified in /etc/dhcpd.conf on the pxe server? Is the receiving server set to uefi or bios mode?

6 Posts

October 11th, 2018 12:00

Josh, sorry for the delay, could not get to the server till now. My DHCP.conf is below, it takes the private network. That may be one issue, as the public network is 192.168.0, while private is 192.168.1. Would I set the iDRAC then to private network, so it boots the compute nodes that way, since I don't have a hard drive on computes or OS, they need to PXE boot? Also, could you confirm the boot filename looks OK on dhcp.conf? The compute blades and server blade are all set currently to boot on Bios mode. I have tried UEFI as well.

Thank you.

option space PXE;
option arch code 93 = unsigned integer 16;
ddns-update-style none;
subnet 192.168.1.0 netmask 255.255.255.0 {
    default-lease-time 1200;
    max-lease-time 1200;
    option routers 192.168.1.27;
    option subnet-mask 255.255.255.0;
    option domain-name "local";
    option domain-name-servers 192.168.1.27;
    option broadcast-address None;
    option interface-mtu 1500;
    group "local" {
        host headnode-em2 {
            hardware ethernet E0:DB:55:39:14:30;
            option host-name "headnode";
            fixed-address 192.168.1.27;

            if option arch = 00:07 {
                filename "uefi.cfg/shim.efi";
            } else {
                filename "pxelinux.0";    
            }

            next-server 192.168.1.27;
        }
    }
}

Moderator

 • 

9.5K Posts

October 11th, 2018 14:00

Check this guide and see if anything is missing https://www.unixmen.com/install-pxe-server-centos-7/ you might need to add options to allow booting.

6 Posts

October 11th, 2018 14:00

OK, will go through it and report. Thank you.

0 events found

No Events found!

Top