Start a Conversation

Unsolved

This post is more than 5 years old

2511

October 2nd, 2007 05:00

Card down notification

I have a " card down notification " InCharge Am-PM. When you go into Containment, the CARDS tab reads, " NOTPRESENT "....
However, the cards are present and fully functional. I have a couple of questions regarding this....
1st, What is causing the alarm and how can I stop it from reoccuring?
2nd, How can I delete the alarm without disabling future alarms incase there is a real issue?

Thanks in advance...

12 Posts

October 2nd, 2007 05:00

One more thing.....the alarm is coming from Cisco routers...

89 Posts

October 2nd, 2007 23:00

Hi,

The following root-cause problem is diagnosed for Card:
Down: Indicates that a card has failed. A card failure causes all ports and interfaces in
the card as well as any system functions associated with the card to fail. For example,
if a Router Switch Module (RSM) is associated with the card, the routing functions
provided by the RSM will fail. The events used to diagnose Card Down include:
◆ OperationallyDown for the card
◆ Card Down for any subcards
◆ SwitchOver for any supervisor cards
◆ Network adapter Down for any ports or interfaces realized by the card
◆ System Down for any systems packaged by the card

well, you better check yor switch, your community strings, if it´s certified. It might be possible that the chassis is not properly reporting the status! Do you have more switches of the same type? Did you try to delete de device and rediscover it again?
You could also unmanage the card in case yo don't wanna see the alarm, this will remove the alarm from SAM but you must be careful with the objects related to the card because they might become unmanaged too

cheers

Fernando

12 Posts

October 3rd, 2007 03:00

Hi Fernando, and thanks for the reply.

While I understand your diagnosis. The cards themselves have not failed. The alarms I am seeing are false. The device and cards are functional.

I am looking for an answer to what might have caused the false alarm.

Thanks!

12 Posts

October 9th, 2007 05:00

Thanks for the help, but I was able to get this resolved. The resolution was simple, I just needed to know how to get it done.

What made this difficult to diagnose is there was nothing wrong with the router or the cards. All were functioning as they were suppose to. We were just getting false alarms.
Unmanaging, rediscovering, remanaging etc never worked. I finally contacted EMC about a " syntax string " that would remove the false alarm yet allow the device to send a legitimate alarm if needed....

Thanks again....

89 Posts

October 9th, 2007 05:00

Well

the 'not present' status also happens when the card dissapers from the device. I had the same with interfaces like loopbacks, did u try to remove and re-discover the device?
did u check the last certification list from EMC?
is your device certified? what is the OID of your device?

cheers

F

89 Posts

October 9th, 2007 05:00

Glad to hear !!!

would you mind to post the resolution and the problem?
it will be quite helpful in case this happens again and someone needs a possible or definitive solution. Any hints or solutions for similiar cases sometimes help.

Many thanks in advance

Fernando

12 Posts

October 9th, 2007 06:00

The error was simply a " Card Down " notification. However, it was false. The events were from a core router, so we had to be careful how we removed the alarm. Simply filtering it out was not the answer. We also tried unmanaging, rediscovering and remanaging and that did not resolve it either.

We run a Unix platform. The following were the steps taken to safely remove the alarm.

1. Log in as super user " sudo -s "
All alarms manually removed need to be done from the Smarts Bin file. Where the executibles are located.
2. cd /opt/InCharge6/SAM/smarts/bin
The syntax to generate the required response should include the class and the event names. The prompt is
3. ./sm_ems --server=INCHARGE-SA clear ( class ) ( event ) ( name ) ( event name ) ( source )
Example
./sm_ems --server=INCHARGE-SA clear Interface Down IF-s00177.phoenixville.pa.chs.net/3 Down INCHARGE-AM-PM

Additionally, when you double click the alarm and the splash screen is produced. Make sure your entry reads exactly like the alarm itself. If there are caps use caps, etc....

Anyway, I hope this helps....

3 Posts

June 4th, 2010 09:00

Hi All,

Thats really a helpful information.

I was too facing many such false alarm reported by the user for which I am giving Support.

But, there still a question?? What made this false alarm to appear on the Notification Console even if the Card was not Down.

Even after the doing the walk on the Card Oper Status, I get the result as Card Down but the User reapeatedly report as to be a false alarm.

Is this the issue with Certification for Polling the Wrong MIB ?

Thanks

Arpita

1 Message

January 7th, 2011 07:00

I have a similar issue with Cisco 2811 chassis.  Unfortunately my SMARTs install is on a Windows platform, so the above solution does not work, but it points me in another direction.  I had opened a case with EMC support and the issue I was having was that the OIDtype \IP\smarts\local\conf\discovery\oid2type_Cisco.conf Card-Fault = OldCiscoChassis is reporting this card as down or unknown (I forget which now).  In any event the OldCiscoChassis has been depricated by Cisco, yet we were unable to find any documentation with Cisco for a new oid type to use.  I have not yet, but will be opening a TAC case with Cisco.  I still fear that they will not have any solution available and I will ahve to modify SMARTs to deal with this false alarm as above.  Does anyone know how to apply the above solution for Unix to a Windows platform?

52 Posts

February 2nd, 2011 12:00

The OldCiscoChassis MIB has historically been a sore spot for these kind of problems.  It would randomly re-index the entries in the table and in many cases, they would either no longer correspond to the right cards anymore or the status was unreliable or just plain incorrect.

The trouble is that when they deprecated the MIB (back in IOS 11.x and earlier) they said the Cisco Entity FRU Control (CEFC) MIB was the designated replacement.  Unfortunately, it still isn't generally available on all hardware - even with the latest IOS releases.

In many cases, customers have simply removed the "Card-Fault" entry in the oid2type.conf file for the device in question.  This is equivalent to unmanaging the Card as there is no possible way for the Card to generate a root-cause under those circumstances.

In IP 8.1, we introduced an intermediate option called "Card_Fault_Default".  This infers the state of the Card from the local and connected network adapters.  It is still short of having correct instrumentation for the Card, but it is arguably better than the problems we experienced with the OldCiscoChassis MIB.

Also in an upcoming release of IP, we plan to have the Card instrumentation (on Cisco devices) be less constrained to the SysObjectID than the MIBs present on the device - as the device moves to support the newer MIBs, we will adjust to take advantage of it.

In passing, I would be a little leery of using the "workaround" in question.  It will indicate that the event has ostensibly been cleared from the source IP domain thus making SAM clear the event, but if SAM is restarted the event will be re-notified since IP still believes the condition on the card to still be valid.

Regards,

Bill

1 Message

July 19th, 2013 11:00

Hi,

I'm having the same issue wiht 3 of my routers showing the following error whenever they are re-discovered by SMARTS. has anyone found a resolution yet? This seems to be happening only for some routers and not all.

InCharge Server INCHARGE-SA:

  NL_NOTIFY Card CARD-NAMR2.CG.COM/2 [] [C2921/C2951 AC Power Supply] Down (100%):

  Indicates that a failed card is the root cause.

12 Posts

July 22nd, 2013 22:00

Just adding some more information around this, we have seen this problem a few times recently with upgrades from 8.x to 9.x for customers, and it is generally caused by the device(s) not responding correctly to the CISCO-ENTITY-FRU-CONTROL-MIB (OID 1.3.6.1.4.1.9.9.117), which is now used by default for a lot of Cisco devices. It's support however is dependent on IOS version it seems.

In some cases we have have fixed it by reverting back to the OLD-CISCO-CHASSIS-MIB (OID 1.3.6.1.4.1.9.3.6), and others the CISCO-STACK-MIB. (OID .1.3.6.1.4.1.9.5.1.3.1), depending on the device type.

To do this, find the entry in the oid2type_Cisco.conf file, and change the Card-Fault entry under INSTRUMENTATION from CiscoEntityFRU to be the correct type for your device, either OldCiscoChassis or CiscoStack, e.g.

Card-Fault  = OldCiscoChassis

Being good Smarts citizens, you will be using sm_edit to do this so that the original file is left untouched in the conf directory and your modifications are in local/conf so that whoever needs to look at this later can see what you have done.

Benjamin Johns

Senior Technology Consultant

iQ Consult Pty Ltd

No Events found!

Top