Start a Conversation

Unsolved

This post is more than 5 years old

2130

February 4th, 2009 22:00

Smarts AM-PM Discovery process take long time

Hello All.

I have a problem with post discovery process it take long time.
AM-PM server installed on Sun T1000 machine.
DB is 34 MB.

Number of Systems [Instrumented for Connectivity/Performance]
Total Number of Systems: 83 [83/83]
Number of Router: 48 [48/48]
Number of Switch: 35 [35/35]

Total Number of IPs: 3405 [3200/0]
Total Number of Ports: 1413 [84/84]
Number on Switch: 1413 [84/84]

Total Number of Interfaces: 16395 [16210/224]
Number on Router: 16129 [15945/183]
Number on Switch: 266 [265/41]

Total Number of Links: 114
Number of NetworkConnections: 66
Number of Cables: 12
Number of TrunkCables: 36

Total Number of MACs: 2288
Total Number of STPNodes: 32309

The most of time take "Start creating STP trunk" and Reconfigure .
Al process take 30 - 40 minute, even if i discover one device and no take CPU or MEM.

36 Posts

February 4th, 2009 23:00

Couple of things
- This is a fairly large domain with a lot of objects, particularly STPNodes.
- Post discovery process is the same regardless of whether you're discovering 1 device or all devices as it has to go through all the data to figure out connections, etc.
- Post discovery includes processes like Reconfigure that are single threaded, therefore does not benefit from multi cpu multi core type machines. If I remember correctly T1000 has lots of cores but low processing speed which may be causing your problem. Discovery may work faster in you laptop than it is on T1000.

I think your options are limited to moving to a server with faster CPU or divide your network into multiple domains to take advantage of multi core architecture and have smaller reconfigure times

Hope this helps,

Regards,
Berkay Mollamustafaoglu
http://www.ifountain.com/smarts
Voip: + 1 703 349 0538
mberkay on yahoo, google and skype

138 Posts

February 5th, 2009 00:00

Hello Berkay !

Shure, Sun T1000 is not so beauty , but we have no problem with Hardware Performance (CPU, MEM, OI).

A problem only with AM-PM DB working, currently i move out the STPNode process from post discovery, but Reconfigure process still take long time, about 10 min.

Work with DB through ASL goes very slowly, for example: request for all objects of class (getinstances) or remove all objects from class take a long time.
Avg - 1 object per second.
5: pollsys(0xFBEFA3B8, 1, 0xFBEFA658, 0x00000000) = 1
/5: pollsys(0xFBEFA2C0, 1, 0xFBEFA560, 0x00000000) = 1
/5: recv(70, "9191FA 2 < 0CF11 A iEA e".., 2048, 0) = 16
/5: write(1, " S T P N O D E - B y i s".., 33) = 33
/5: pollsys(0xFBEFA528, 1, 0xFBEFA7C8, 0x00000000) = 1
/5: send(70, "D016 ~ A18\0 < S03A01A n".., 80, 0) = 80
/4: lwp_cond_wait(0x00161BB8, 0x00161BF0, 0xFCDFB8B0, 1) (sleeping...)
/1: lwp_cond_wait(0xFE4C4D38, 0xFE4C4D10, 0x00000000, 1) (sleeping...)
/2: sigtimedwait(0xFEA7BEB0, 0xFEA7BE30, 0x00000000) (sleeping...)
/3: lwp_cond_wait(0x0007DD30, 0x0007DD58, 0xFDCFB7A0, 1) (sleeping...)
/5: pollsys(0xFBEFA3B8, 1, 0xFBEFA658, 0x00000000) (sleeping...)
/5: pollsys(0xFBEFA3B8, 1, 0xFBEFA658, 0x00000000) = 1
/5: pollsys(0xFBEFA2C0, 1, 0xFBEFA560, 0x00000000) = 1

this is truss for this process.

:-(

54 Posts

February 6th, 2009 00:00

Hi Hemul,

What version of IP do you have installed? Version7.0.2 has a big performance problem. In our environment a full discovery tales more than 10 hours. It looks much better with IP 7.0.3.X.

Regards ada

138 Posts

February 6th, 2009 00:00

Hi ada.
We use last version of IP, 7.0.3.9 - SP3 with last Patch 9
IP_NETWORK_SUITE: V7.0.3.9(85223), 21-Nov-2008 11:25:41 Copyright 1995-2008, EMC Corporation - Build 48
I noticed that, this problem is present on all our servers (17 with am-pm) and hardware performance like input-ouput sybsystem (IO) , MEM, CPU all is normal.
Currently i have no idea what to do with this problem.
Фll operations on Smarts DB, like (getInstances,manage, unmanage, remove) pass slowly.

:-(

P.s
Ada, 10H - this is too long, how many devices you manage with Smarts ?

Message was edited by:
Hemul

54 Posts

February 8th, 2009 23:00

Hi Hemul,
We know that it's too long. We are managing over 20'000 devices. On 1 AM/PM domain we have about 3'000 - 4^000 devices.
I installed now IP 7.0.3.10. In 1 week I sould have more details about discovery time in our network.

138 Posts

February 9th, 2009 00:00

Hi ada!

Woooww, 20 000 devices per AM it's crazy, i know about 1000 devices per PM - EMC recommendation!

138 Posts

February 9th, 2009 00:00

Hi ada!

Woooww, 20 000 devices per AM it's crazy, i know about 1000 devices per PM - EMC recommendation!

36 Posts

February 9th, 2009 00:00

Hi Hemul,

It's not that T1000 is not a good machine in general. It's just not good for Smarts AMs. Just to be sure, how do you determine that you don't have a CPU problem? As I mentioned reconfigure is a single threaded process. Your overall CPU utilization can be 5percent, yet you may still have a CPU problem. Sun coolthreads series of machines have multiple cores and multiple threads at each core. This is great for webservers etc. but not good for apps that require clock speed. It may be that Smarts could only utilize a single thread of a single core during reconfigure. EMC does not recommend these machines for AMs (they are find for SAM/OIs afaik)

Also, how many objects do you currently have in your repository? Sounds like you have made some adjustments but it may still be too high which would explain the slowness. When you do getInstances how many objects are there? If it is a high number, it is not surprising at all.

And finally, there may also be things for you to do, for Smarts to be able to utilize all of the memory you have on the server. You may want to ask EMC support about this.

In short, you have a very large environment and your hw/sw configuration may need to be altered significantly to cope with it.


Hope this helps,
Berkay Mollamustafaoglu
http://www.ifountain.com
Voip: + 1 703 349 0538
mberkay on yahoo, google and skype

138 Posts

February 9th, 2009 01:00

Hello Berkay!

1) I use iostat + top or sar commands to determine performance issue.
2) About CPU usage, when some query is running on DB, CPU usage less than 5 % or about 5% used by sm_server - am-pm.

When discovery probing is running, cpu usage by am-pm is 25 -30 %.

3) Current state of my DB
Number of Systems [Instrumented for Connectivity/Performance]
Total Number of Systems: 83 [83/83]
Number of Router: 48 [48/48]
Number of Switch: 35 [35/35]

-- Sounds like you have made some adjustments
4) I have no advanced configuration, on clean AM-PM Service Pack3 with last patch 9 (Now, i'm going to install Patch 10)

I will check this problem later after upgrade to Patch 10, bu have no see performance issue in changes file for Patch 10.

Message was edited by:
Hemul

54 Posts

February 9th, 2009 08:00

Hi Hemul,

We have in total 7 AM/PM domains for all 20'000 devices. On 1 AM/PM domain we have maximum 4'200 devices.

Regards,
Adrian

138 Posts

June 16th, 2009 10:00

Hi ada,

Please let mу know how many active interfaces-ports do you have on each APM server ?

We found solution for our problem with slow discovery process.

Problem was in a lot of interfaces on our devices, for example each our device (DSLSwitch) composed of 4000+ virtual interfaces, so firstable APM run probe for walk IfIndex table, after this probe run ic-mib2-data.asl that for each IfIndex object collect 7 - 8 parameters from Mib2 (Octets, IfSpeed, IfAlias, etc ...), so that is the point, to get 4000 separate requests with get_bulk or get_next take long time, about 17 minutes for each device.
Solution: to set IfTypePattern parameter in $base/local/conf/discovery/tpmgr-param.conf file that exclude all virtual interfaces. If TypePattern parameter is defined for some SYsObjetID, this point will bee excluded from ic-mib2-data.asl script.
No Events found!

Top