Start a Conversation

Unsolved

T

1 Rookie

 • 

64 Posts

2029

January 16th, 2020 08:00

Dell C5000 with C5220 blades

We are running several C5000 with C5220 blades in our data centers.  We ran into an issue.  I am sure there is a solution, but I just have not figured it out.  Any help would be much appreciated!  Thank you.

 

1) IPMI web page goes down, but the IPMI IP still pings. It doesn't even make sense. They go down after a few days to a week.  Someone has to go into the data center and pull the blade out and put it back in.  We keep re-seating the blades, only for the issues to re-appear. We thought an update of the BIOS and IPMI would help, but it didn’t.

 

Has anyone seen this before?  Would love to solve this issue.

 

Thank you.

Moderator

 • 

8.4K Posts

January 16th, 2020 12:00

TheServerNinja,

 

Would you confirm a couple things for me please?
What OS installed on the systems?
How are you connected, are you using the individual sled nics or the chassis connection? I ask as using both to connect to the servers or their BMC's can cause packet collisions (causing disruption in nic connectivity)
When updating, did you also include the BMC update?

Lastly, do you see anything in the logs near the time it fails?

Let me know what you see.



1 Rookie

 • 

64 Posts

January 16th, 2020 15:00

Thank you for you help Chris.  Very much appreciated.

 

What OS installed on the systems?

 

A: It's a mix of CentOS, ubuntu, windows, etc. Even when the blades haven't been used yet with no OS or drive, it still loses connection to the GUI page after a few days to a week max.

 

 

How are you connected, are you using the individual sled nics or the chassis connection? I ask as using both to connect to the servers or their BMC's can cause packet collisions (causing disruption in nic connectivity)



A: Individual sled NICS. All blades have 2 LAN connections. Our chassis's are the one single port chassis for BMC and these chassis's do not have dedicated BMC/IPMI ports... Our only option is to use the single BMC port that is located by the power cords. But yes this makes the most sense on why they would be going down.  Is there a solution?




When updating, did you also include the BMC update?

 

A: BMC and BIOS have already been updated to the latest version on ALL of them.

 

Lastly, do you see anything in the logs near the time it fails?

 

A: No log

1 Rookie

 • 

64 Posts

January 20th, 2020 06:00

Hi Chris,

 

Hope you had a nice weekend.

 

Please let us know if you have any solutions to this issue.

 

Thank you.  Have a good one.

 

Ninjas

Moderator

 • 

8.4K Posts

January 20th, 2020 13:00

So is this what you're seeing?

 

8_sled_front.jpg

 

 

1 Rookie

 • 

64 Posts

January 21st, 2020 12:00

Hi Chris,

 

Thanks again for the support.

 

1,2,4 and 5 are all correct.  3 is not since we have 12 nodes in our chassis.

 

Thank you

 

Ninjas

Moderator

 • 

8.4K Posts

January 23rd, 2020 06:00


The BMC is configured in each individual server sled BIOS. The Default NIC setting is "Dedicated", which means all sleds are accessed through the single BMC port (located as you stated near the power cords). Once the BIOS BMC IP Address information has been configured on each of the installed sleds, BMC Management must be done EITHER through the Dedicated single dedicated BMC port for all sleds, OR through the Shared LOM 1 port specified on each additional sled front.

So you would need to either have them all accessible through the BMC port, or individually accessible through there individual NIC ports, but if you chose this you would have to remove the cable from the BMC port.

 

March 30th, 2020 12:00

Hello,

 

We are getting the same issue the OP is having. We have about 10 of these 12 blade systems and they are all doing the same thing. After about 2 days, the BMC GUI will stop responding, but the BMC IP will still ping.

 

Tried using the one central BMC port setup originally. Resulted in the issue above.

 

Tried your second method of making the BMC's really NIC 1, setting the blades in the BIOS to "shared" and disconnecting the BMC port. It for sure worked. But then the GUI would just die A LOT quicker.

No Events found!

Top