December 1st, 2023 13:52

monitoring health of server hardware


So, is there some good out-of-band way to monitor health of the server (fans, temperatures, mem, cpu, and... drives/raid of course).

Servers are all 14th/15th gen, and have Enterprise iDRACs. Like R640, M640 and such.

I am mostly interested in the rack servers though.

I have tried IPMI, but that doesnt report about drives/raid.

SNMP... searched and searched, but nope.

I just want some simple way to check the health via a e.g. nagios script.

OMSA is too heavy to install for this (and always fail to work after a few OS upgrades).

Group Manager, yeah just a web UI, right?

How?? :)



December 1st, 2023 18:31



I believe the best option for you would be OpenManage Enterprise, which  is the modern tool compared to OMSA, you can read about here


Let me know if this helps. 


December 3rd, 2023 22:32

Ah yes, i did try that. But that was, well...

1. i basically need to have a whole dedicated server/VM running at wherever the servers are

2. it was big and slow

3. i want all surveillance in the tool of my choice (nagios in my case)

4. and i am pretty sure there were other issues too that made this non viable

It just seems like such a straight forward thing to be able to do? To ask the servers if they are OK...

Surely there must be some way for an enterprise iDRAC to accomodate this? :)

I mean, a single point where i can get an answer like

- All good


- Disk in slot X is bad

or such

The "RollupStatus" thing from "racadm getsysinfo" (or "racadm raid get enclosures -o​") is, well, halfway there.

It will state if all is good, but wont specify what the problem is if it aint.

Also, its not there for gen 13 servers it seems.




04-12-2023 08:27 AM

