Unsolved
This post is more than 5 years old
7 Posts
0
2887
JVM Crash Error while reading blobs (clips)
Hello together,
I have a Java server which is executing scheduled jobs. The job load clips, which i've uploaded a time ago in a public emc centera.
After a while (one or two hours), the Java server crashes with a JVM error. It appears that the error occurs while a scheduled job is running.
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGSEGV (0xb) at pc=0x0000000000000000, pid=62558, tid=140042202744576
#
# JRE version: 6.0_43-b01
# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.14-b01 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# C 0x0000000000000000
#
# If you would like to submit a bug report, please visit:
# http://java.sun.com/webapps/bugreport/crash.jsp
# The crash happened outside the Java Virtual Machine in native code.
# See problematic frame for where to report the bug.
#
--------------- T H R E A D ---------------
Current thread (0x00007f5d238e3000): JavaThread "pool-1-thread-121" [_thread_in_native, id=3769, stack(0x00007f5e19ff8000,0x00007f5e1dbf9000)]
siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000000000
Registers:
RAX=0x00007f5d113f3510, RBX=0x00007f5e80b79500, RCX=0x0000000000000000, RDX=0x00007f5d17202bb0
RSP=0x00007f5e1dbf69d8, RBP=0x00007f5d6ea387e0, RSI=0x00007f5e643e6660, RDI=0x00007f5d2788cae0
R8 =0x000000000000ffff, R9 =0x000000000000000f, R10=0xfffffffffffff81f, R11=0x00007f5f2570fdae
R12=0x00007f5e643e6660, R13=0x00007f5e643e6670, R14=0x00007f5e1dbf6ec0, R15=0x00007f5d2788cae0
RIP=0x0000000000000000, EFLAGS=0x0000000000010202, CSGSFS=0x666e000000000033, ERR=0x0000000000000014
TRAPNO=0x000000000000000e
Top of Stack: (sp=0x00007f5e1dbf69d8)
0x00007f5e1dbf69d8: 00007f5ee5ba3af3 00000000000001e7
0x00007f5e1dbf69e8: 0000000000000000 00007f5e1dbf6a30
0x00007f5e1dbf69f8: 00007f5e1dbf6ad0 00000000000001e7
0x00007f5e1dbf6a08: 0000000000000000 00007f5e1dbf7120
0x00007f5e1dbf6a18: 00007f5e1dbf71a0 00007f5d171c0290
0x00007f5e1dbf6a28: 00007f5e643e9b20 00007f5ee633c6d0
0x00007f5e1dbf6a38: 00007f5e1dbf6b00 00007f5d167b6f80
0x00007f5e1dbf6a48: 0000000e00000040 000000000000000e
0x00007f5e1dbf6a58: 00007f5e00000002 0000000016d69fa8
0x00007f5e1dbf6a68: 00007f5f25000000 00000000fbad8001
0x00007f5e1dbf6a78: 00007f5efc2e7d70 0000000a0000000a
0x00007f5e1dbf6a88: 0000000a00000000 00007f5d172387e0
0x00007f5e1dbf6a98: 00007f5e1dbf6d01 00007f5e1dbf6d9e
0x00007f5e1dbf6aa8: 00007f5e643e6660 00007f5e643e45f0
0x00007f5e1dbf6ab8: 00007f5ee5d16acc 00007f5e643e4600
0x00007f5e1dbf6ac8: 00007f5efc1d12f9 00007f5ee633f870
0x00007f5e1dbf6ad8: 00007f5ee5baec00 00007f5d1658cf10
0x00007f5e1dbf6ae8: 0000002400000040 0000000000000024
0x00007f5e1dbf6af8: 0000000000000000 00007f5e80b79500
0x00007f5e1dbf6b08: 00007f5e400154c0 0000000000000000
0x00007f5e1dbf6b18: 00007f5e643e6660 00007f5e1dbf6ec0
0x00007f5e1dbf6b28: 0000000000000000 00007f5d2788cae0
0x00007f5e1dbf6b38: 00007f5e643e6660 0000000000000006
0x00007f5e1dbf6b48: 00007f5ee5baf02e 00000000000001e7
0x00007f5e1dbf6b58: 0000000000000000 00007f5d2788cae0
0x00007f5e1dbf6b68: 00007f5e1dbf6ec0 00007f5e1dbf6d20
0x00007f5e1dbf6b78: 00007f5e1dbf6d28 00007f5e00000021
0x00007f5e1dbf6b88: 0000000000000008 0000000000000001
0x00007f5e1dbf6b98: 00007f5e1dbf6c40 0000000000000000
0x00007f5e1dbf6ba8: 0000000000000000 00007f5e1dbf7120
0x00007f5e1dbf6bb8: 00007f5e1dbf71a0 00007f5d171c0290
0x00007f5e1dbf6bc8: 00007f5e643e45f0 00007f5d166c5700
Instructions: (pc=0x0000000000000000)
0xffffffffffffffe0:
Register to memory mapping:
RAX=0x00007f5d113f3510 is an unknown value
RBX=0x00007f5e80b79500 is an unknown value
RCX=0x0000000000000000 is an unknown value
RDX=0x00007f5d17202bb0 is an unknown value
RSP=0x00007f5e1dbf69d8 is pointing into the stack for thread: 0x00007f5d238e3000
RBP=0x00007f5d6ea387e0 is an unknown value
RSI=0x00007f5e643e6660 is an unknown value
RDI=0x00007f5d2788cae0 is an unknown value
R8 =0x000000000000ffff is an unknown value
R9 =0x000000000000000f is an unknown value
R10=0xfffffffffffff81f is an unknown value
R11=0x00007f5f2570fdae: in /lib64/libc.so.6 at 0x00007f5f2568c000
R12=0x00007f5e643e6660 is an unknown value
R13=0x00007f5e643e6670 is an unknown value
R14=0x00007f5e1dbf6ec0 is pointing into the stack for thread: 0x00007f5d238e3000
R15=0x00007f5d2788cae0 is an unknown value
Stack: [0x00007f5e19ff8000,0x00007f5e1dbf9000], sp=0x00007f5e1dbf69d8, free space=61434k
Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
J com.filepool.natives.FPLibraryNative.FPTag_BlobReadPartial(JLjava/io/OutputStream;JJJ)V
J com.filepool.fplibrary.FPTag.BlobRead(Ljava/io/OutputStream;)V
J java.util.concurrent.ThreadPoolExecutor$Worker.run()V
j java.lang.Thread.run()V+11
v ~StubRoutines::call_stub
The behavior I can not explain. Anyone know the problem and can help me?
christopher_ime
2K Posts
0
November 26th, 2013 19:00
Firstly, welcome to the forums, and above all, thank you for being an EMC customer.
Please consider moving this question as-is (no need to recreate) to the proper forum for maximum visibility. Questions written to the users' own "Discussions" space don't get the same amount of attention and can go unanswered for a long time.
You can do so by selecting "Move" under ACTIONS along the upper-right. Then search for and select: "Centera Support Forum"
Centera Support Forum
mfh2
208 Posts
0
November 27th, 2013 09:00
Hello ChronHermann -
It appears to be crashing inside the native DLL. Which SDK version and which platform are you running on?
Regards,
Mike Horgan
ChronHermann
7 Posts
0
November 27th, 2013 23:00
Hello Mike,
my SDK Version:
libFPCore64.so.3.3.719
libFPLibrary64.so.3.3.719
libFPParser64.so.3.3.50
libFPStreams64.so.3.3.719
libFPUtils64.so.3.3.719
libFPXML64.so.3.3.719
libPAI_module64.so.3.3.100
Platform:
Red Hat Enterprise Linux Server release 6.4 (Santiago)
Linux localhost 2.6.32-358.14.1.el6.x86_64 #1 SMP Tue Jul 16 12:08:01 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux
Regards,
Korbinian Hermann
mfh2
208 Posts
0
November 29th, 2013 11:00
Hello Korbinian -
That all looks good to me. Your best bet for finding a cause would be to enable SDK logging and to wait for the crash to occur again. Since you say it takes hours for the crash to reoccur you may find the log gets very large (many GB) so make sure there is plenty of disk available. You can enable SDK logging by adding this to your programs environment:
FP_LOGPATH=/path/to/where/he/has/enuf/space/filename.log
If you manage to catch the crash in the log, I'd say the first couple thousand lines and the last couple thousand would be the most important to the solution; you could post those here for input if you wish.
Good Luck,
Mike Horgan
ChronHermann
7 Posts
0
December 2nd, 2013 00:00
Hello Mike,
thanks for your answer. I enabled the SDK logging and found some errors:
...
1385971200873 2013-12-02 08:00:00.873 [debug] 27481.277350144 [TRANSACTION] Probe Reply localhost/6/PROBE
1385971201862 2013-12-02 08:00:01.862 [debug] 27481.276297472 [EXCEPTION] In 'Connection.cpp' at line 994: Exception
error=-10101
syserror=0
message=receive: waitForReadingData(1000) returned zero
trace=No trace available
1385971201862 2013-12-02 08:00:01.862 [debug] 27481.276297472 [POOL] Close FPSocket (mSocket=106) Connection open,unlocked marked(GOOD)
1385971201862 2013-12-02 08:00:01.862 [debug] 27481.276297472 [RETRY] AN(168.159.214.21) Unavailable (for 15 secs)
1385971201862 2013-12-02 08:00:01.862 [debug] 27481.276297472 [POOL] Close connections for addr=168.159.214.21 (num=0,max=0)
1385971201862 2013-12-02 08:00:01.862 [debug] 27481.276297472 [POOL] End close connections (num=0,max=0) ----
1385971201862 2013-12-02 08:00:01.862 [debug] 27481.276297472 [RETRY] retry (0) Probe because Exception
error=-10101
syserror=0
message=receive: waitForReadingData(1000) returned zero
trace=FPDatagramSocket.receive (timeout=1000)
transid=localhost/4/PROBE
1385971201862 2013-12-02 08:00:01.862 [debug] 27481.276297472 [RETRY] Cluster::selectAccessNodeWithoutProbe(true) -> node #0=168.159.214.21, load=99-99
1385971201862 2013-12-02 08:00:01.862 [debug] 27481.276297472 [POOL] Open FPSocket (mSocket=106) Connection open,unlocked marked(GOOD) Type(2) for 168.159.214.21
1385971201862 2013-12-02 08:00:01.862 [debug] 27481.276297472 [TRANSACTION] Probe Request localhost/7/PROBE
1385971201863 2013-12-02 08:00:01.863 [debug] 27481.276297472 [PACKET] send SmartPacket
NET_SYSTEMID type=string value=localhost
NET_TRANSACTIONID type=string value=localhost/7/PROBE
NET_VERSION type=integer value=3 HPP_CLIENT_VERSION type=integer value=197376 POOLSERVER_VERSION type=integer value=1 NET_MESSAGEID type=integer value=36 POOLSERVER_OPCODE type=integer value=1
1385971201863 2013-12-02 08:00:01.863 [debug] 27481.276297472 [TRANSACTION] Probe Reply localhost/7/PROBE
1385971201874 2013-12-02 08:00:01.874 [debug] 27481.283666176 [EXCEPTION] In 'Connection.cpp' at line 994: Exception
error=-10101
syserror=0
message=receive: waitForReadingData(1000) returned zero
trace=No trace available
1385971201874 2013-12-02 08:00:01.874 [debug] 27481.283666176 [POOL] Close FPSocket (mSocket=107) Connection open,unlocked marked(GOOD)
1385971201874 2013-12-02 08:00:01.874 [debug] 27481.283666176 [RETRY] AN(168.159.214.21) Unavailable (for 15 secs)
1385971201874 2013-12-02 08:00:01.874 [debug] 27481.283666176 [POOL] Close connections for addr=168.159.214.21 (num=0,max=0)
1385971201874 2013-12-02 08:00:01.874 [debug] 27481.283666176 [POOL] End close connections (num=0,max=0) ----
1385971201874 2013-12-02 08:00:01.874 [debug] 27481.283666176 [RETRY] retry (0) Probe because Exception
error=-10101
syserror=0
message=receive: waitForReadingData(1000) returned zero
trace=FPDatagramSocket.receive (timeout=1000)
transid=localhost/5/PROBE
...
1385971208857 2013-12-02 08:00:08.857 [debug] 27481.276297472 [API] PoolOption 'prefetchsize' = '32768'
1385971208857 2013-12-02 08:00:08.857 [log] 27481.276297472 [API] End FPPool_Open8(168.159.214.20,168.159.214.21?/home/transfer/c1armtesting.pea)
1385971208857 2013-12-02 08:00:08.857 [debug] 27481.283666176 [EXCEPTION] In 'FPClip.cpp' at line 3793: Exception
error=-10018
syserror=0
message=
trace=FPClip.GetMetaAttribute(retention.class)
1385971208858 2013-12-02 08:00:08.858 [debug] 27481.283666176 [API] GlobalOption 'retrycount' = '6'
1385971208858 2013-12-02 08:00:08.858 [debug] 27481.283666176 [API] GlobalOption 'retrysleep' = '-1'
...
1385973098643 2013-12-02 08:31:38.643 [log] 27481.274192128 [API] Start FPPool_GetLastError()
1385973098643 2013-12-02 08:31:38.643 [log] 27481.274192128 [API] End FPPool_GetLastError() --> [0]
1385973098643 2013-12-02 08:31:38.643 [log] 27481.274192128 [API] Start FPPool_GetLastError()
1385973098643 2013-12-02 08:31:38.643 [log] 27481.274192128 [API] End FPPool_GetLastError() --> [0]
1385973098643 2013-12-02 08:31:38.643 [log] 27481.274192128 [API] Start FPTag_GetTagName8(-,-,256)
1385973098643 2013-12-02 08:31:38.643 [debug] 27481.274192128 [EXCEPTION] In 'FPTag.cpp' at line 1530: Exception
error=-10012
syserror=0
message=
trace=No trace available
1385973098643 2013-12-02 08:31:38.643 [error] 27481.274192128 [API] End FPTag_GetTagName8(-,-,0): Error -10012
1385973098643 2013-12-02 08:31:38.643 [log] 27481.274192128 [API] Start FPPool_GetLastError()
1385973098643 2013-12-02 08:31:38.643 [log] 27481.274192128 [API] End FPPool_GetLastError() --> [-10012]
1385973098643 2013-12-02 08:31:38.643 [log] 27481.274192128 [API] Start FPPool_GetLastError()
1385973098643 2013-12-02 08:31:38.643 [log] 27481.274192128 [API] End FPPool_GetLastError() --> [-10012]
...
mfh2
208 Posts
0
December 2nd, 2013 09:00
Hi Korbinian -
There seem to be a number of things going on here. The -10101 errors indicate a networking problem, possibly with that particular access node (is it down?). This isn't of great concern, although it may slow you down a bit.
The -10018 is an FP_ATTR_NOT_FOUND error that occurs when your code is checking to see if a clip has a retention class defined, not a problem as long as you handle this return code appropriately.
The -10012 looks like an actual problem. The SDK is having some sort of issue related to how you are trying to traverse a clip that was opened in FP_FLAT mode (which does not support parent/child/sibling type traversal). You should probably look into this and see if it is preventing you from releasing all of your SDK resources correctly. Leaked SDK handles will eventually case problems, possibly along the lines of what you are experiencing after a few hours of runtime.
Good Luck,
Mike Horgan
ChronHermann
7 Posts
0
December 5th, 2013 04:00
Hi Mike,
thank you for your answer! I tried to find clips that were opened in FP_FLAT mode but all the clips are opened in FP_OPEN_ASTREE mode. I checked all my SDK handles too and the handles and clips are closed correctly. I tried to "null" the handles and clips after closing - The same problem.
Can i discover which clips are openend in FP_FLAT mode?
Regards,
Korbinian Hermann
mfh2
208 Posts
0
December 5th, 2013 06:00
Hi Korbinian -
The SDK log contains every FPClip_Open call with clipid and open_mode in the second and third parameters. From the docs:
That seems to be your best bet. Happy Spelunking!
Mike Horgan
ChronHermann
7 Posts
0
December 5th, 2013 08:00
Hello Mike,
i searched in my sdk-logfile but i found only FPClip_Open calls with FP_OPEN_ASTREE mode..
It seems that my problem is on another cause.
Regards,
Korbinian Hermann
mfh2
208 Posts
0
December 5th, 2013 08:00
I agree, this path of investigation has reached a dead end. But hopefully you have learned quite a bit about Centera SDK logs and they may help you find the root cause.
Best Regards,
Mike Horgan
ChronHermann
7 Posts
0
December 16th, 2013 02:00
Hello togehter,
i'm looking still after a root cause and so i added to my java-execution call the parameter "-Xcheck:jni". Now i receive after every BlobRead-Call the following exception in my console:
WARNING in native method: JNI call made with exception pending
at com.filepool.natives.FPLibraryNative.FPTag_BlobReadPartial(Native Method)
at com.filepool.fplibrary.FPTag.BlobReadPartial(Unknown Source)
at com.filepool.fplibrary.FPTag.BlobRead(Unknown Source)
Has anyone an idea why this error is shown?
Regards,
Korbinian Hermann
mfh2
208 Posts
0
December 16th, 2013 03:00
Sorry Korbinian, I have no clue on this one.
Perhaps someone from EMC support will chime in.
Mike Horgan
ChronHermann
7 Posts
0
December 16th, 2013 22:00
Yesterday, i added the parameter to the "EMC Console". The exception came also there, as i downloaded a clip. So i think this is a main-problem from the centera sdk, but not the cause of my problem.
Regards,
Korbinian Hermann