Start a Conversation

Unsolved

A3

1 Rookie

 • 

93 Posts

274

December 1st, 2023 13:52

monitoring health of server hardware

Hi!

So, is there some good out-of-band way to monitor health of the server (fans, temperatures, mem, cpu, and... drives/raid of course).

Servers are all 14th/15th gen, and have Enterprise iDRACs. Like R640, M640 and such.

I am mostly interested in the rack servers though.

I have tried IPMI, but that doesnt report about drives/raid.

SNMP... searched and searched, but nope.

I just want some simple way to check the health via a e.g. nagios script.

OMSA is too heavy to install for this (and always fail to work after a few OS upgrades).

Group Manager, yeah just a web UI, right?

How?? :)

Moderator

 • 

8.7K Posts

December 1st, 2023 18:31

Alexander-36725,

 

I believe the best option for you would be OpenManage Enterprise, which  is the modern tool compared to OMSA, you can read about here

 

Let me know if this helps. 

 

1 Rookie

 • 

93 Posts

December 3rd, 2023 22:32

Ah yes, i did try that. But that was, well...

1. i basically need to have a whole dedicated server/VM running at wherever the servers are

2. it was big and slow

3. i want all surveillance in the tool of my choice (nagios in my case)

4. and i am pretty sure there were other issues too that made this non viable

It just seems like such a straight forward thing to be able to do? To ask the servers if they are OK...

Surely there must be some way for an enterprise iDRAC to accomodate this? :)

I mean, a single point where i can get an answer like

- All good

or

- Disk in slot X is bad

or such

The "RollupStatus" thing from "racadm getsysinfo" (or "racadm raid get enclosures -o​") is, well, halfway there.

It will state if all is good, but wont specify what the problem is if it aint.

Also, its not there for gen 13 servers it seems.

Redfish??

Moderator

 • 

2.3K Posts

04-12-2023 08:27 AM

DELL-Erman O

Social Media and Communities Professional

Dell Technologies | Enterprise Support Services

#IWork4Dell

Did I answer your query? Please click on ‘Mark as Accepted Answer’. ‘Thumbs up’ the posts you like!

1 Message

April 1st, 2024 04:20

I know this thread is quite old but thought I would post an update. I have been searching for monitoring features for a Dell small business single server and face the similar challenges of overkill when it comes to introducing other software or appliances.

It appears that iDrac Enterprise 9 now supports integrated alerting for almost everything including power supplies, storage controller and physical disk failures etc. There are many more alerts available than I can list here.

I simply setup an app password in my gmail account and configured the email smtp settings in iDrac using port 587 with starttls. Then I enabled alerting under Configuration>System Settings. You can configure alerting according to severity levels of critical/warning/informational and select email, snmp traps, remote syslog, or even redfish as targets. I simply used the quick alert setup for critical and warning levels.

I know my post is probably not useful anymore because everyone might already know this, but it wasn't exactly obvious and did take me a little time digging thru documentation etc.

My plan is to also investigate what options there are for SNMP and Syslog but as was mentioned those require some other type of appliance or service etc to send the output to.

Email isn't exactly the most reliable but its at least something rather than simply waiting for a server to fail because a physical disk has been flashing for days or weeks without anyone noticing.

No Events found!

Top