Unsolved
This post is more than 5 years old
10 Posts
0
35114
OME 1.2 False Positives
Hi,
We receive false positive system down events for servers. OME says the server is down, but actually the server is up and running. The ICMP is configured as follows (Timeout 2500ms and retries 5), to try and solve the false positive issue, but this does not prevent the false positive messages. The ping time for the servers is between 40 and 200 ms, depending on their location. Status polling is set to every 5 minutes.
Is there something I have missed during the configuration of OME and/or the servers? Does someone has a solution for this issue?
DELL-Pupul M
1K Posts
0
August 23rd, 2013 05:00
Hi Shortek,
Which is the version of OME that you are using? If it is OME 1.1, upgrading to OME v1.1.1 or OME v1.2 should solve your problem.
If you are already using OME v1.2, then this link should be of some help:
http://en.community.dell.com/techcenter/systems-management/f/4494/p/19505533/20417596.aspx#20417596
DELL-Pupul M
1K Posts
1
August 23rd, 2013 06:00
Hi Shortek,
Thanks for confirming. Couple of questions:
http://www.dell.com/support/Manuals/us/en/04/Product/dell-opnmang-essentials-v1.2
Shortek
10 Posts
0
August 23rd, 2013 06:00
I am already on OME 1.2. I will have a look at the given article.
Thanks so far and I will update this threat if I succeeded to solve the issue with the given article.
Shortek
10 Posts
0
August 23rd, 2013 06:00
The given article is not the solution I am looking for, as we already did the testing that is provided there. Currently our ICMP settings are already above normal values. timeout = 2500ms and retries = 5
I can't believe a normal server has a reply of 2500ms for more than 5 pings.
With that said, our other monitoring system shows the server up and the timeout within this system is set to 500 with 3 retries.
Shortek
10 Posts
0
August 23rd, 2013 07:00
Let me know if you need more information
DELL-Rob C
2 Intern
2 Intern
•
2.8K Posts
0
August 23rd, 2013 10:00
Thanks for all of this detail.
So let's try to put the status poll back to the default 1 hour and run with that for a while to see how it behaves.
One misconception with OME is that you have to drop the status poll to 5 minutes in order to get timely alerts. OME has on-demand health polling. So when an alert comes into the console we always to out and poll the device for its health status. So it is not usually necessary to have aggressive status polling.
Let's see what that does and report back.
Thanks much,
Rob
Shortek
10 Posts
0
September 9th, 2013 04:00
Sorry for not coming back to this sooner.
It looks like it is going better now, but still now and then we get some false positives
We are busy moving the OME server to a different location. This will be a fresh install, with a new database, so maybe the problems will be solved after the move.
If not, I will come back to this.
Thanks to everyone who tried to help.
Shortek
10 Posts
0
September 16th, 2013 01:00
We have moved the installation, but the problems are not solved. One question, are the ICMP configuration settings, set in the Discovery Ranges, also used for the Status Polling?
We still have devices mentioned as down and an hour later they are up again. Because the status polling saw the devices down, while they were actually up and running. This has probably to do, with missed replies, during the scan. Since we use off site backup, over the LAN line, this probably can cause high ping times or connection time out messages. Upgrading the line is not an option at this moment.
So if I change the number of retries and timeout in the ICMP settings, will this also affect the Status Polling
DELL-Pupul M
1K Posts
0
September 16th, 2013 02:00
Hi,
The answer is yes, OME uses the same ICMP configuration settings for status polling as well. So if you change the number of retires and timeout in the ICMP settings, it will affect Status Polling.
This post should help you: http://en.community.dell.com/techcenter/systems-management/f/4494/p/19505533/20417596.aspx#20417596
Shortek
10 Posts
0
September 16th, 2013 02:00
Perfect, thanks for your response.
Going to trial and error for the best settings.
Shortek
10 Posts
0
September 23rd, 2013 05:00
At this moment I have the following settings in place
ICMP Configuration:
Timeout: 2500ms
Retries: 10
Status Schedule:
1 hour
Speed is in the middle of the bar.
Still we receive false positives. Sometimes, 4, 5 or 6 servers at a time. When I receive a notification and ping the device myself from my machine or from the OME server, the device is UP.
Correct me if I am wrong, but with the above settings, the status polling should give an error when it misses 10 pings or 10 pings above 2500ms. Am I correct?
What could cause this issue?