SDK thread uses 100% CPU (Windows, C-SDK 3.2.661)

I have a problem at a customer...

During normal usage of a Centera (reading and far more writing) on an 8 core machine a threads starts using 100% of an CPU. After 2 to 4 days i have more 8 threads stuck somewhere in the centera-sdk ==> from this time on the machine is not useable any more since all cpu resources are used for the process hosting and using the centera sdk.

Reading and Writing to the centera is used with only one connection on mutliple threads, but our application can access the centera normaly until, all cpu ressources are exausted.

The hanging does not seem to be called directly by our application, because it continues working.

For informations about the cluster itself, i have included the logs from the first connection to the centera "FirstConnectToCentera.txt".

This is the last output in the logfile. (the entiere logfiles was 18 GB)

I have include a much larger part as attachment "12296.zip":

evil thread: 3628
last line:     316,502


1229641204727    2008-12-18 23:00:04.727        [debug]        8072.3628    [POOL]    Use existing FPSocket (mSocket=620) Connection open,locked marked(GOOD)
1229641204727    2008-12-18 23:00:04.727        [debug]        8072.3628    [TRANSACTION]    Import Request cl0502/542156/READ_CLIP
1229641204727    2008-12-18 23:00:04.727        [debug]        8072.3628    [PACKET]    send SmartPacket
  NET_SYSTEMID type=string value=cl0502
  NET_TRANSACTIONID type=string value=cl0502/542156/READ_CLIP
  NET_VERSION type=integer value=3  HPP_CLIENT_VERSION type=integer value=197120  NET_MESSAGEID type=integer value=42  HPP_VERSION type=integer value=1  HPP_CONTROL type=integer value=0  HPP_OPCODE type=integer value=0  fieldcode=187 type=integer value=1  HPP_BLOBSIZE type=long value=-1  HPP_CALCID_NAMING type=string value=MD5
  HPP_IS_CLIPFILE type=integer value=1  HPP_BLOBID type=string value=BOH1N9KL1GPLDe5K0FJJLQHE7IIG413P66QMT90MFT02PVT7MSGMR
  HPP_CLIPID type=string value=BOH1N9KL1GPLDe5K0FJJLQHE7IIG413P66QMT90MFT02PVT7MSGMR
  HPP_OFFSET type=long value=0  HPP_LENGTH type=long value=9223372036854775807  fieldcode=157 type=integer value=1
1229641204727    2008-12-18 23:00:04.727        [debug]        8072.3628    [TRANSACTION]    Import Data cl0502/542156/READ_CLIP
1229641204742    2008-12-18 23:00:04.742        [debug]        8072.3628    [POOL]    Unlock FPSocket (mSocket=620) Connection open,locked marked(GOOD)
1229641204742    2008-12-18 23:00:04.742        [debug]        8072.3628    [CORE]    ClusterCloud::getPrimaryCluster(0)
1229641204742    2008-12-18 23:00:04.742        [debug]        8072.3628    [CORE]    ClusterCloud::getPrimaryCluster(0) -> 95841e5e-1dd1-11b2-9a1c-e8a3b2f3372e

After this, there are no more entries from the thread 3628!

I also made a dump of the process, but since i don't have the symbolfiles from EMC it is not 100% accurate.

(This dump is not from the same occurance as the logfile, but the error was the same)

>    ntdll.dll!_KiFastSystemCallRet@0()    
     ntdll.dll!_NtReadFile@36()  + 0xc bytes   
     kernel32.dll!_ReadFile@20()  + 0x67 bytes   
     fpos32.dll!fp_ReadFile()  + 0x25 bytes   
     [Frames below may be incorrect and/or missing, no symbols loaded for fpos32.dll]   
     FPStreams.dll!FPBasicGenericStream::prepareBuffer()  + 0xbd bytes   
     FPStreams.dll!FPBasicGenericStream::read()  + 0x64 bytes   
     fpparser.dll!fpparser_2_3::XMLReader::refreshRawBuffer()  + 0x5b bytes   
     fpparser.dll!fpparser_2_3::XMLReader::XMLReader()  + 0x10e bytes   
     fpparser.dll!fpparser_2_3::ReaderMgr::createReader()  + 0x12b bytes   
     fpparser.dll!fpparser_2_3::IGXMLScanner::scanReset()  + 0x268 bytes   
     fpparser.dll!fpparser_2_3::XMLDeclImpl::getNodeName()  + 0x4dce bytes   
     ffffffff()   
   

all Components are using the "3.02.661" Version:

  •     FPLibrary.dll    3.02.661.0
  •     FPCore.dll       3.02.661.0
  •     FPStreams.dll  3.02.661.0
  •     FPUtils.dll        3.02.661.0
  •     FPXML.dll       3.02.661.0

Thanks in advance for any help!

If you need any more information i will gladly supply them!

kind regards,

Christoph Herzog

0 Kudos
5 Replies
khanz1
2 Iron

Re: SDK thread uses 100% CPU (Windows, C-SDK 3.2.661)

Hi Christoph,

I suggest that you request the customer to open up a Service Request (SR) with EMC to have this looked at and get a resolution. Please make sure to share this and any other information you have around the application using the Centera SDK when opening up the SR.

EMC support should be contacted to resolve any Centera/SDK related issues that occur on the customer's environment.

Thanks,
Zeeshan

Re: SDK thread uses 100% CPU (Windows, C-SDK 3.2.661)

Hi Zeeshan,

you are of course right, i also opened a support call (to be excact the customer has opened it...)

Thanks anyway,

Christoph

0 Kudos

Re: SDK thread uses 100% CPU (Windows, C-SDK 3.2.661)

Sadly the problem as not really solved, but after the heavy import was over it didn't occur any more.

0 Kudos
Highlighted
Larry_Margolis
1 Copper

Re: SDK thread uses 100% CPU (Windows, C-SDK 3.2.661)

We just had a customer run into this issue.  They opened a ticket with EMC and were advised to update to 3.2 p5.  (They were running with 3.2 p1.)

The latest patch level seems to have resolved the issue. 

Once I had figured out (by examining dumps) that the problem was threads looping within the SDK, the first thing I did was read all the Centera SDK release notes for each newer patch.  I saw nothing describing this problem.

It would have saved a lot of people a lot of time if this had been mentioned in the release notes.

0 Kudos
gstuartemc
2 Iron

Re: SDK thread uses 100% CPU (Windows, C-SDK 3.2.661)

We specifically add Release Notes for all known customer issues that are specifically resolved by the release / patch. As the standard response for issues where the customer is using an EOSL / unsupported SDK version is to request a reproducible test case using that latest version, being asked to test with such a version does not automatically mean that support know that the issue is resolved by the release in question.

In this case, the issue may have been addressed internally within Engineering. If this was not done in respsonse to a known customer issue then it would not have been added to the Release Notes. It may even have been "fixed" as a side effect of some other change.

It is impractical to add release notes for every known defect found within Engineering / QA testing of a product. This is standard practice within the industry.

0 Kudos