Unsolved
1 Rookie
•
93 Posts
0
274
monitoring health of server hardware
Hi!
So, is there some good out-of-band way to monitor health of the server (fans, temperatures, mem, cpu, and... drives/raid of course).
Servers are all 14th/15th gen, and have Enterprise iDRACs. Like R640, M640 and such.
I am mostly interested in the rack servers though.
I have tried IPMI, but that doesnt report about drives/raid.
SNMP... searched and searched, but nope.
I just want some simple way to check the health via a e.g. nagios script.
OMSA is too heavy to install for this (and always fail to work after a few OS upgrades).
Group Manager, yeah just a web UI, right?
How?? :)
DELL-Chris H
Moderator
Moderator
•
8.7K Posts
0
December 1st, 2023 18:31
Alexander-36725,
I believe the best option for you would be OpenManage Enterprise, which is the modern tool compared to OMSA, you can read about here.
Let me know if this helps.
Alexander-36725
1 Rookie
1 Rookie
•
93 Posts
0
December 3rd, 2023 22:32
Ah yes, i did try that. But that was, well...
1. i basically need to have a whole dedicated server/VM running at wherever the servers are
2. it was big and slow
3. i want all surveillance in the tool of my choice (nagios in my case)
4. and i am pretty sure there were other issues too that made this non viable
It just seems like such a straight forward thing to be able to do? To ask the servers if they are OK...
Surely there must be some way for an enterprise iDRAC to accomodate this? :)
I mean, a single point where i can get an answer like
- All good
or
- Disk in slot X is bad
or such
The "RollupStatus" thing from "racadm getsysinfo" (or "racadm raid get enclosures -o") is, well, halfway there.
It will state if all is good, but wont specify what the problem is if it aint.
Also, its not there for gen 13 servers it seems.
Redfish??
quietman65
1 Message
0
April 1st, 2024 04:20
I know this thread is quite old but thought I would post an update. I have been searching for monitoring features for a Dell small business single server and face the similar challenges of overkill when it comes to introducing other software or appliances.
It appears that iDrac Enterprise 9 now supports integrated alerting for almost everything including power supplies, storage controller and physical disk failures etc. There are many more alerts available than I can list here.
I simply setup an app password in my gmail account and configured the email smtp settings in iDrac using port 587 with starttls. Then I enabled alerting under Configuration>System Settings. You can configure alerting according to severity levels of critical/warning/informational and select email, snmp traps, remote syslog, or even redfish as targets. I simply used the quick alert setup for critical and warning levels.
I know my post is probably not useful anymore because everyone might already know this, but it wasn't exactly obvious and did take me a little time digging thru documentation etc.
My plan is to also investigate what options there are for SNMP and Syslog but as was mentioned those require some other type of appliance or service etc to send the output to.
Email isn't exactly the most reliable but its at least something rather than simply waiting for a server to fail because a physical disk has been flashing for days or weeks without anyone noticing.