I have a problem at a customer...
During normal usage of a Centera (reading and far more writing) on an 8 core machine a threads starts using 100% of an CPU. After 2 to 4 days i have more 8 threads stuck somewhere in the centera-sdk ==> from this time on the machine is not useable any more since all cpu resources are used for the process hosting and using the centera sdk.
Reading and Writing to the centera is used with only one connection on mutliple threads, but our application can access the centera normaly until, all cpu ressources are exausted.
The hanging does not seem to be called directly by our application, because it continues working.
For informations about the cluster itself, i have included the logs from the first connection to the centera "FirstConnectToCentera.txt".
This is the last output in the logfile. (the entiere logfiles was 18 GB)
I have include a much larger part as attachment "12296.zip":
evil thread: 3628
last line: 316,502
1229641204727 2008-12-18 23:00:04.727 [debug] 8072.3628 [POOL] Use existing FPSocket (mSocket=620) Connection open,locked marked(GOOD)
1229641204727 2008-12-18 23:00:04.727 [debug] 8072.3628 [TRANSACTION] Import Request cl0502/542156/READ_CLIP
1229641204727 2008-12-18 23:00:04.727 [debug] 8072.3628 [PACKET] send SmartPacket
NET_SYSTEMID type=string value=cl0502
NET_TRANSACTIONID type=string value=cl0502/542156/READ_CLIP
NET_VERSION type=integer value=3 HPP_CLIENT_VERSION type=integer value=197120 NET_MESSAGEID type=integer value=42 HPP_VERSION type=integer value=1 HPP_CONTROL type=integer value=0 HPP_OPCODE type=integer value=0 fieldcode=187 type=integer value=1 HPP_BLOBSIZE type=long value=-1 HPP_CALCID_NAMING type=string value=MD5
HPP_IS_CLIPFILE type=integer value=1 HPP_BLOBID type=string value=BOH1N9KL1GPLDe5K0FJJLQHE7IIG413P66QMT90MFT02PVT7MSGMR
HPP_CLIPID type=string value=BOH1N9KL1GPLDe5K0FJJLQHE7IIG413P66QMT90MFT02PVT7MSGMR
HPP_OFFSET type=long value=0 HPP_LENGTH type=long value=9223372036854775807 fieldcode=157 type=integer value=1
1229641204727 2008-12-18 23:00:04.727 [debug] 8072.3628 [TRANSACTION] Import Data cl0502/542156/READ_CLIP
1229641204742 2008-12-18 23:00:04.742 [debug] 8072.3628 [POOL] Unlock FPSocket (mSocket=620) Connection open,locked marked(GOOD)
1229641204742 2008-12-18 23:00:04.742 [debug] 8072.3628 [CORE] ClusterCloud::getPrimaryCluster(0)
1229641204742 2008-12-18 23:00:04.742 [debug] 8072.3628 [CORE] ClusterCloud::getPrimaryCluster(0) -> 95841e5e-1dd1-11b2-9a1c-e8a3b2f3372e
After this, there are no more entries from the thread 3628!
I also made a dump of the process, but since i don't have the symbolfiles from EMC it is not 100% accurate.
(This dump is not from the same occurance as the logfile, but the error was the same)
ntdll.dll!_NtReadFile@36() + 0xc bytes
kernel32.dll!_ReadFile@20() + 0x67 bytes
fpos32.dll!fp_ReadFile() + 0x25 bytes
[Frames below may be incorrect and/or missing, no symbols loaded for fpos32.dll]
FPStreams.dll!FPBasicGenericStream::prepareBuffer() + 0xbd bytes
FPStreams.dll!FPBasicGenericStream::read() + 0x64 bytes
fpparser.dll!fpparser_2_3::XMLReader::refreshRawBuffer() + 0x5b bytes
fpparser.dll!fpparser_2_3::XMLReader::XMLReader() + 0x10e bytes
fpparser.dll!fpparser_2_3::ReaderMgr::createReader() + 0x12b bytes
fpparser.dll!fpparser_2_3::IGXMLScanner::scanReset() + 0x268 bytes
fpparser.dll!fpparser_2_3::XMLDeclImpl::getNodeName() + 0x4dce bytes
all Components are using the "3.02.661" Version:
Thanks in advance for any help!
If you need any more information i will gladly supply them!
I suggest that you request the customer to open up a Service Request (SR) with EMC to have this looked at and get a resolution. Please make sure to share this and any other information you have around the application using the Centera SDK when opening up the SR.
EMC support should be contacted to resolve any Centera/SDK related issues that occur on the customer's environment.
you are of course right, i also opened a support call (to be excact the customer has opened it...)
We just had a customer run into this issue. They opened a ticket with EMC and were advised to update to 3.2 p5. (They were running with 3.2 p1.)
The latest patch level seems to have resolved the issue.
Once I had figured out (by examining dumps) that the problem was threads looping within the SDK, the first thing I did was read all the Centera SDK release notes for each newer patch. I saw nothing describing this problem.
It would have saved a lot of people a lot of time if this had been mentioned in the release notes.
We specifically add Release Notes for all known customer issues that are specifically resolved by the release / patch. As the standard response for issues where the customer is using an EOSL / unsupported SDK version is to request a reproducible test case using that latest version, being asked to test with such a version does not automatically mean that support know that the issue is resolved by the release in question.
In this case, the issue may have been addressed internally within Engineering. If this was not done in respsonse to a known customer issue then it would not have been added to the Release Notes. It may even have been "fixed" as a side effect of some other change.
It is impractical to add release notes for every known defect found within Engineering / QA testing of a product. This is standard practice within the industry.