Start a Conversation

Unsolved

This post is more than 5 years old

2887

November 25th, 2013 07:00

JVM Crash Error while reading blobs (clips)

Hello together,

I have a Java server which is executing scheduled jobs. The job load clips, which i've uploaded a time ago in a public emc centera.

After a while (one or two hours), the Java server crashes with a JVM error. It appears that the error occurs while a scheduled job is running.

#

# A fatal error has been detected by the Java Runtime Environment:

#

#  SIGSEGV (0xb) at pc=0x0000000000000000, pid=62558, tid=140042202744576

#

# JRE version: 6.0_43-b01

# Java VM: Java HotSpot(TM) 64-Bit Server VM (20.14-b01 mixed mode linux-amd64 compressed oops)

# Problematic frame:

# C  0x0000000000000000

#

# If you would like to submit a bug report, please visit:

#   http://java.sun.com/webapps/bugreport/crash.jsp

# The crash happened outside the Java Virtual Machine in native code.

# See problematic frame for where to report the bug.

#

---------------  T H R E A D  ---------------

Current thread (0x00007f5d238e3000):  JavaThread "pool-1-thread-121" [_thread_in_native, id=3769, stack(0x00007f5e19ff8000,0x00007f5e1dbf9000)]

siginfo:si_signo=SIGSEGV: si_errno=0, si_code=1 (SEGV_MAPERR), si_addr=0x0000000000000000

Registers:

RAX=0x00007f5d113f3510, RBX=0x00007f5e80b79500, RCX=0x0000000000000000, RDX=0x00007f5d17202bb0

RSP=0x00007f5e1dbf69d8, RBP=0x00007f5d6ea387e0, RSI=0x00007f5e643e6660, RDI=0x00007f5d2788cae0

R8 =0x000000000000ffff, R9 =0x000000000000000f, R10=0xfffffffffffff81f, R11=0x00007f5f2570fdae

R12=0x00007f5e643e6660, R13=0x00007f5e643e6670, R14=0x00007f5e1dbf6ec0, R15=0x00007f5d2788cae0

RIP=0x0000000000000000, EFLAGS=0x0000000000010202, CSGSFS=0x666e000000000033, ERR=0x0000000000000014

  TRAPNO=0x000000000000000e

 

Top of Stack: (sp=0x00007f5e1dbf69d8)

0x00007f5e1dbf69d8:   00007f5ee5ba3af3 00000000000001e7

0x00007f5e1dbf69e8:   0000000000000000 00007f5e1dbf6a30

0x00007f5e1dbf69f8:   00007f5e1dbf6ad0 00000000000001e7

0x00007f5e1dbf6a08:   0000000000000000 00007f5e1dbf7120

0x00007f5e1dbf6a18:   00007f5e1dbf71a0 00007f5d171c0290

0x00007f5e1dbf6a28:   00007f5e643e9b20 00007f5ee633c6d0

0x00007f5e1dbf6a38:   00007f5e1dbf6b00 00007f5d167b6f80

0x00007f5e1dbf6a48:   0000000e00000040 000000000000000e

0x00007f5e1dbf6a58:   00007f5e00000002 0000000016d69fa8

0x00007f5e1dbf6a68:   00007f5f25000000 00000000fbad8001

0x00007f5e1dbf6a78:   00007f5efc2e7d70 0000000a0000000a

0x00007f5e1dbf6a88:   0000000a00000000 00007f5d172387e0

0x00007f5e1dbf6a98:   00007f5e1dbf6d01 00007f5e1dbf6d9e

0x00007f5e1dbf6aa8:   00007f5e643e6660 00007f5e643e45f0

0x00007f5e1dbf6ab8:   00007f5ee5d16acc 00007f5e643e4600

0x00007f5e1dbf6ac8:   00007f5efc1d12f9 00007f5ee633f870

0x00007f5e1dbf6ad8:   00007f5ee5baec00 00007f5d1658cf10

0x00007f5e1dbf6ae8:   0000002400000040 0000000000000024

0x00007f5e1dbf6af8:   0000000000000000 00007f5e80b79500

0x00007f5e1dbf6b08:   00007f5e400154c0 0000000000000000

0x00007f5e1dbf6b18:   00007f5e643e6660 00007f5e1dbf6ec0

0x00007f5e1dbf6b28:   0000000000000000 00007f5d2788cae0

0x00007f5e1dbf6b38:   00007f5e643e6660 0000000000000006

0x00007f5e1dbf6b48:   00007f5ee5baf02e 00000000000001e7

0x00007f5e1dbf6b58:   0000000000000000 00007f5d2788cae0

0x00007f5e1dbf6b68:   00007f5e1dbf6ec0 00007f5e1dbf6d20

0x00007f5e1dbf6b78:   00007f5e1dbf6d28 00007f5e00000021

0x00007f5e1dbf6b88:   0000000000000008 0000000000000001

0x00007f5e1dbf6b98:   00007f5e1dbf6c40 0000000000000000

0x00007f5e1dbf6ba8:   0000000000000000 00007f5e1dbf7120

0x00007f5e1dbf6bb8:   00007f5e1dbf71a0 00007f5d171c0290

0x00007f5e1dbf6bc8:   00007f5e643e45f0 00007f5d166c5700

Instructions: (pc=0x0000000000000000)

0xffffffffffffffe0:  

Register to memory mapping:

RAX=0x00007f5d113f3510 is an unknown value

RBX=0x00007f5e80b79500 is an unknown value

RCX=0x0000000000000000 is an unknown value

RDX=0x00007f5d17202bb0 is an unknown value

RSP=0x00007f5e1dbf69d8 is pointing into the stack for thread: 0x00007f5d238e3000

RBP=0x00007f5d6ea387e0 is an unknown value

RSI=0x00007f5e643e6660 is an unknown value

RDI=0x00007f5d2788cae0 is an unknown value

R8 =0x000000000000ffff is an unknown value

R9 =0x000000000000000f is an unknown value

R10=0xfffffffffffff81f is an unknown value

R11=0x00007f5f2570fdae: in /lib64/libc.so.6 at 0x00007f5f2568c000

R12=0x00007f5e643e6660 is an unknown value

R13=0x00007f5e643e6670 is an unknown value

R14=0x00007f5e1dbf6ec0 is pointing into the stack for thread: 0x00007f5d238e3000

R15=0x00007f5d2788cae0 is an unknown value

Stack: [0x00007f5e19ff8000,0x00007f5e1dbf9000],  sp=0x00007f5e1dbf69d8,  free space=61434k

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)

J  com.filepool.natives.FPLibraryNative.FPTag_BlobReadPartial(JLjava/io/OutputStream;JJJ)V

J  com.filepool.fplibrary.FPTag.BlobRead(Ljava/io/OutputStream;)V

J  java.util.concurrent.ThreadPoolExecutor$Worker.run()V

j  java.lang.Thread.run()V+11

v  ~StubRoutines::call_stub

The behavior I can not explain. Anyone know the problem and can help me?

November 26th, 2013 19:00

Firstly, welcome to the forums, and above all, thank you for being an EMC customer.

Please consider moving this question as-is (no need to recreate) to the proper forum for maximum visibility.  Questions written to the users' own "Discussions" space don't get the same amount of attention and can go unanswered for a long time.

You can do so by selecting "Move" under ACTIONS along the upper-right.  Then search for and select: "Centera Support Forum"

Centera Support Forum

208 Posts

November 27th, 2013 09:00

Hello ChronHermann -

It appears to be crashing inside the native DLL. Which SDK version and which platform are you running on?

Regards,

Mike Horgan

November 27th, 2013 23:00

Hello Mike,

my SDK Version:

libFPCore64.so.3.3.719

libFPLibrary64.so.3.3.719

libFPParser64.so.3.3.50

libFPStreams64.so.3.3.719

libFPUtils64.so.3.3.719

libFPXML64.so.3.3.719

libPAI_module64.so.3.3.100

Platform:

Red Hat Enterprise Linux Server release 6.4 (Santiago)

Linux localhost 2.6.32-358.14.1.el6.x86_64 #1 SMP Tue Jul 16 12:08:01 PDT 2013 x86_64 x86_64 x86_64 GNU/Linux

Regards,

Korbinian Hermann

208 Posts

November 29th, 2013 11:00

Hello Korbinian -

That all looks good to me. Your best bet for finding a cause would be to enable SDK logging and to wait for the crash to occur again. Since you say it takes hours for the crash to reoccur you may find the log gets very large (many GB) so make sure there is plenty of disk available.  You can enable SDK logging by adding this to your programs environment:

FP_LOGPATH=/path/to/where/he/has/enuf/space/filename.log


If you manage to catch the crash in the log, I'd say the first couple thousand lines and the last couple thousand would be the most important to the solution; you could post those here for input if you wish.


Good Luck,

Mike Horgan

December 2nd, 2013 00:00

Hello Mike,

thanks for your answer. I enabled the SDK logging and found some errors:

...

1385971200873   2013-12-02 08:00:00.873         [debug]         27481.277350144 [TRANSACTION]   Probe Reply localhost/6/PROBE

1385971201862   2013-12-02 08:00:01.862         [debug]         27481.276297472 [EXCEPTION]     In 'Connection.cpp' at line 994: Exception

error=-10101

syserror=0

message=receive: waitForReadingData(1000) returned zero

trace=No trace available

1385971201862   2013-12-02 08:00:01.862         [debug]         27481.276297472 [POOL]  Close FPSocket (mSocket=106) Connection open,unlocked marked(GOOD)

1385971201862   2013-12-02 08:00:01.862         [debug]         27481.276297472 [RETRY] AN(168.159.214.21) Unavailable (for 15 secs)

1385971201862   2013-12-02 08:00:01.862         [debug]         27481.276297472 [POOL]  Close connections for addr=168.159.214.21 (num=0,max=0)

1385971201862   2013-12-02 08:00:01.862         [debug]         27481.276297472 [POOL]  End close connections (num=0,max=0) ----

1385971201862   2013-12-02 08:00:01.862         [debug]         27481.276297472 [RETRY] retry (0) Probe because Exception

error=-10101

syserror=0

message=receive: waitForReadingData(1000) returned zero

trace=FPDatagramSocket.receive (timeout=1000)

transid=localhost/4/PROBE

1385971201862   2013-12-02 08:00:01.862         [debug]         27481.276297472 [RETRY] Cluster::selectAccessNodeWithoutProbe(true) -> node #0=168.159.214.21, load=99-99

1385971201862   2013-12-02 08:00:01.862         [debug]         27481.276297472 [POOL]  Open FPSocket (mSocket=106) Connection open,unlocked marked(GOOD) Type(2) for 168.159.214.21

1385971201862   2013-12-02 08:00:01.862         [debug]         27481.276297472 [TRANSACTION]   Probe Request localhost/7/PROBE

1385971201863   2013-12-02 08:00:01.863         [debug]         27481.276297472 [PACKET]        send SmartPacket

  NET_SYSTEMID type=string value=localhost

  NET_TRANSACTIONID type=string value=localhost/7/PROBE

  NET_VERSION type=integer value=3  HPP_CLIENT_VERSION type=integer value=197376  POOLSERVER_VERSION type=integer value=1  NET_MESSAGEID type=integer value=36  POOLSERVER_OPCODE type=integer value=1

1385971201863   2013-12-02 08:00:01.863         [debug]         27481.276297472 [TRANSACTION]   Probe Reply localhost/7/PROBE

1385971201874   2013-12-02 08:00:01.874         [debug]         27481.283666176 [EXCEPTION]     In 'Connection.cpp' at line 994: Exception

error=-10101

syserror=0

message=receive: waitForReadingData(1000) returned zero

trace=No trace available

1385971201874   2013-12-02 08:00:01.874         [debug]         27481.283666176 [POOL]  Close FPSocket (mSocket=107) Connection open,unlocked marked(GOOD)

1385971201874   2013-12-02 08:00:01.874         [debug]         27481.283666176 [RETRY] AN(168.159.214.21) Unavailable (for 15 secs)

1385971201874   2013-12-02 08:00:01.874         [debug]         27481.283666176 [POOL]  Close connections for addr=168.159.214.21 (num=0,max=0)

1385971201874   2013-12-02 08:00:01.874         [debug]         27481.283666176 [POOL]  End close connections (num=0,max=0) ----

1385971201874   2013-12-02 08:00:01.874         [debug]         27481.283666176 [RETRY] retry (0) Probe because Exception

error=-10101

syserror=0

message=receive: waitForReadingData(1000) returned zero

trace=FPDatagramSocket.receive (timeout=1000)

transid=localhost/5/PROBE

...

1385971208857   2013-12-02 08:00:08.857         [debug]         27481.276297472 [API]   PoolOption 'prefetchsize' = '32768'

1385971208857   2013-12-02 08:00:08.857         [log]           27481.276297472 [API]   End FPPool_Open8(168.159.214.20,168.159.214.21?/home/transfer/c1armtesting.pea)

1385971208857   2013-12-02 08:00:08.857         [debug]         27481.283666176 [EXCEPTION]     In 'FPClip.cpp' at line 3793: Exception

error=-10018

syserror=0

message=

trace=FPClip.GetMetaAttribute(retention.class)

1385971208858   2013-12-02 08:00:08.858         [debug]         27481.283666176 [API]   GlobalOption 'retrycount' = '6'

1385971208858   2013-12-02 08:00:08.858         [debug]         27481.283666176 [API]   GlobalOption 'retrysleep' = '-1'

...

1385973098643   2013-12-02 08:31:38.643         [log]           27481.274192128 [API]   Start FPPool_GetLastError()

1385973098643   2013-12-02 08:31:38.643         [log]           27481.274192128 [API]   End FPPool_GetLastError() --> [0]

1385973098643   2013-12-02 08:31:38.643         [log]           27481.274192128 [API]   Start FPPool_GetLastError()

1385973098643   2013-12-02 08:31:38.643         [log]           27481.274192128 [API]   End FPPool_GetLastError() --> [0]

1385973098643   2013-12-02 08:31:38.643         [log]           27481.274192128 [API]   Start FPTag_GetTagName8(-,-,256)

1385973098643   2013-12-02 08:31:38.643         [debug]         27481.274192128 [EXCEPTION]     In 'FPTag.cpp' at line 1530: Exception

error=-10012

syserror=0

message=

trace=No trace available

1385973098643   2013-12-02 08:31:38.643         [error]         27481.274192128 [API]   End FPTag_GetTagName8(-,-,0): Error -10012

1385973098643   2013-12-02 08:31:38.643         [log]           27481.274192128 [API]   Start FPPool_GetLastError()

1385973098643   2013-12-02 08:31:38.643         [log]           27481.274192128 [API]   End FPPool_GetLastError() --> [-10012]

1385973098643   2013-12-02 08:31:38.643         [log]           27481.274192128 [API]   Start FPPool_GetLastError()

1385973098643   2013-12-02 08:31:38.643         [log]           27481.274192128 [API]   End FPPool_GetLastError() --> [-10012]

...

208 Posts

December 2nd, 2013 09:00

Hi Korbinian -

There seem to be a number of things going on here. The -10101 errors indicate a networking problem, possibly with that particular access node (is it down?). This isn't of great concern, although it may slow you down a bit.

The -10018 is an FP_ATTR_NOT_FOUND error that occurs when your code is checking to see if a clip has a retention class defined, not a problem as long as you handle this return code appropriately.

The -10012 looks like an actual problem. The SDK is having some sort of issue related to how you are trying to traverse a clip that was opened in FP_FLAT mode (which does not support parent/child/sibling type traversal).  You should probably look into this and see if it is preventing you from releasing all of your SDK resources correctly. Leaked SDK handles will eventually case problems, possibly along the lines of what you are experiencing after a few hours of runtime.

Good Luck,

Mike Horgan

December 5th, 2013 04:00

Hi Mike,

thank you for your answer! I tried to find clips that were opened in FP_FLAT mode but all the clips are opened in FP_OPEN_ASTREE mode. I checked all my SDK handles too and the handles and clips are closed correctly. I tried to "null" the handles and clips after closing - The same problem.

Can i discover which clips are openend in FP_FLAT mode?

Regards,

Korbinian Hermann

208 Posts

December 5th, 2013 06:00

Hi Korbinian -

The SDK log contains every FPClip_Open call with clipid and open_mode in the second and third parameters. From the docs:

#define FP_OPEN_ASTREE   1
C-Clip is opened read/write.
#define FP_OPEN_FLAT   2
C-Clip is opened read-only.

That seems to be your best bet. Happy Spelunking!

Mike Horgan

December 5th, 2013 08:00

Hello Mike,

i searched in my sdk-logfile but i found only FPClip_Open calls with FP_OPEN_ASTREE mode..

It seems that my problem is on another cause.

Regards,

Korbinian Hermann

208 Posts

December 5th, 2013 08:00

I agree, this path of investigation has reached a dead end. But hopefully you have learned quite a bit about Centera  SDK logs and they may help you find the root cause.

Best Regards,

Mike Horgan

December 16th, 2013 02:00

Hello togehter,

i'm looking still after a root cause and so i added to my java-execution call the parameter "-Xcheck:jni". Now i receive after every BlobRead-Call the following exception in my console:

WARNING in native method: JNI call made with exception pending

        at com.filepool.natives.FPLibraryNative.FPTag_BlobReadPartial(Native Method)

        at com.filepool.fplibrary.FPTag.BlobReadPartial(Unknown Source)

        at com.filepool.fplibrary.FPTag.BlobRead(Unknown Source)

Has anyone an idea why this error is shown?

Regards,

Korbinian Hermann

208 Posts

December 16th, 2013 03:00

Sorry Korbinian, I have no clue on this one.

Perhaps someone from EMC support will chime in.

Mike Horgan

December 16th, 2013 22:00

ChronHermann wrote:

Hello togehter,

i'm looking still after a root cause and so i added to my java-execution call the parameter "-Xcheck:jni". Now i receive after every BlobRead-Call the following exception in my console:

WARNING in native method: JNI call made with exception pending

        at com.filepool.natives.FPLibraryNative.FPTag_BlobReadPartial(Native Method)

        at com.filepool.fplibrary.FPTag.BlobReadPartial(Unknown Source)

        at com.filepool.fplibrary.FPTag.BlobRead(Unknown Source)

Has anyone an idea why this error is shown?

Regards,

Korbinian Hermann

Yesterday, i added the parameter to the "EMC Console". The exception came also there, as i downloaded a clip. So i think this is a main-problem from the centera sdk, but not the cause of my problem.

Regards,

Korbinian Hermann

No Events found!

Top