Start a Conversation

Unsolved

This post is more than 5 years old

2090

July 12th, 2007 06:00

RPC time outs during RMAN/Oracle backup

During RMAN/Oracle backups we receive RPC time outs. 99% of the time the backup started by the client (client initiated backups) they run fine. It is the other 1% of the time that the Oracle RMAN reports that the backup has failed due to a RPC timeout.

channel ORA_SBT_TAPE_2: starting piece 1 at 12-JUL-07
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03009: failure of backup command on ORA_SBT_TAPE_1 channel at 07/12/2007 01:35:03
ORA-27191: sbtinfo2 returned error
Additional information: 2
ORA-19511: Error received from media manager layer, error text:
A connection to NW server 'ICTDSTA34.melkweg.tld' could not be established beca
use 'RPC receive operation failed. A network connection could not be establish
ed with the host.'. (2:10:0)

I have monitored when these time outs occur, on which database and on which server, but there doesn't seem to be a pattern to these occurences.

server time channel database
14-apr oberon2.sfb 9:36:37 PM ORA_SBT_TAPE_1 PRDUWV05
3-may oberon1.sfb 10:24:32 PM ORA_SBT_TAPE_1 PRDCDS01
3-jun oberon2.sfb 11:18:29 PM ORA_SBT_TAPE_2 PRDUWV05
10-jul oberon1.sfb 9:19:06 PM ORA_SBT_TAPE_1 PRDISS01
12-jul oberon1.sfb 1:35:03 AM ORA_SBT_TAPE_1 PRDCDS01
12-jul oberon2.sfb 1:27:48 AM ORA_SBT_TAPE_2 PRDUWV05
12-jul oberon1.sfb 1:33:56 AM ORA_SBT_TAPE_1 archivelogfiles
12-jul leda1.sfb 1:34:56 AM ORA_SBT_TAPE_4 TSTISS01

For these occurences I have looked in the daemon.log and found the following:

12-jul oberon2.sfb 1:27:48 AM ORA_SBT_TAPE_2 PRDUWV05


07/12/07 01:27:47 nsrindexd: select: Socket operation on non-socket07/12/07 01:27:48 nsrd: oberon2:RMAN:/OBERON2_INCR_PRDUWV05_8fimjnb0 done saving to pool 'RWKCDSunix' (CDLR003W) 777 MB

12-jul oberon1.sfb 1:33:56 AM ORA_SBT_TAPE_1 archivelogfiles


07/12/07 01:33:45 nsrindexd: select: Socket operation on non-socket07/12/07 01:33:45 nsrd: oberon1:RMAN:/OBERON1_ARCH_PRDCDS01_5limjtrh saving to pool 'RWKCDSunix' (CDLR004N)
07/12/07 01:33:56 nsrd: oberon1:RMAN:/OBERON1_ARCH_PRDUWV05_8iimjtig done saving to pool 'RWKCDSunix' (CDLR004N) 2815 MB

12-jul leda1.sfb 1:34:56 AM ORA_SBT_TAPE_4 TSTISS01

07/12/07 01:34:55 nsrindexd: select: Socket operation on non-socket07/12/07 01:35:00 nsrd: leda1:RMAN:4nimjtte_1_1 done saving to pool 'RWKCDSunix' (CDLR1027) 118 MB

12-jul oberon1.sfb 1:35:03 AM ORA_SBT_TAPE_1 PRDCDS01

07/12/07 01:34:55 nsrindexd: select: Socket operation on non-socket07/12/07 01:35:00 nsrd: leda1:RMAN:4nimjtte_1_1 done saving to pool 'RWKCDSunix' (CDLR1027) 118 MB

07/12/07 01:35:03 nsrd: leda1:RMAN:4oimjtte_1_1 done saving to pool 'RWKCDSunix' (CDLR1027) 153 MB

I haven't been able to find out what the socket operation on non-socket means. Also in the daemon.log on the HP-UX clients there doesn't seem to be any faults:

leda1 12 july
07/12/07 00:32:55 nsrmmd #376: Start nsrmmd #376, with PID 22176, at HOST leda1
07/12/07 01:32:46 nsrmmd #385: Start nsrmmd #385, with PID 165, at HOST leda1

Oberon1 10 july
07/10/07 01:56:58 nsrmmd #194: Start nsrmmd #194, with PID 7917, at HOST oberon1
07/10/07 06:00:26 nsrmmd #211: Start nsrmmd #211, with PID 9928, at HOST oberon1
07/10/07 11:54:00 nsrmmd #184: Start nsrmmd #184, with PID 28424, at HOST oberon1
07/10/07 14:06:17 nsrmmd #184: Start nsrmmd #184, with PID 7624, at HOST oberon1
07/10/07 20:01:22 nsrmmd #214: Start nsrmmd #214, with PID 5229, at HOST oberon1
07/10/07 20:01:56 nsrmmd #215: Start nsrmmd #215, with PID 6005, at HOST oberon1

oberon1 3 may
05/03/07 01:18:21 nsrmmd #1225: Start nsrmmd #1225, with PID 5219, at HOST oberon1
05/03/07 06:00:28 nsrmmd #1255: Start nsrmmd #1255, with PID 21608, at HOST oberon1
05/03/07 20:00:56 nsrmmd #1268: Start nsrmmd #1268, with PID 21264, at HOST oberon1
05/03/07 20:01:24 nsrmmd #1274: Start nsrmmd #1274, with PID 21398, at HOST oberon1
05/03/07 20:05:42 nsrmmd #1276: Start nsrmmd #1276, with PID 21910, at HOST oberon1

oberon1 12 july
07/12/07 06:00:25 nsrmmd #410: Start nsrmmd #410, with PID 2425, at HOST oberon1

oberon2 14 april
04/17/07 06:00:22 nsrmmd #4115: Start nsrmmd #4115, with PID 13358, at HOST oberon2
04/17/07 20:00:51 nsrmmd #4148: Start nsrmmd #4148, with PID 23924, at HOST oberon2

oberon2 3 june
06/03/07 00:02:43 nsrmmd #1563: Start nsrmmd #1563, with PID 1235, at HOST oberon2
06/03/07 00:31:41 nsrmmd #1563: Start nsrmmd #1563, with PID 4699, at HOST oberon2
06/03/07 06:00:23 nsrmmd #1605: Start nsrmmd #1605, with PID 20577, at HOST oberon2
06/03/07 20:07:19 nsrmmd #1624: Start nsrmmd #1624, with PID 25703, at HOST oberon2
06/03/07 20:47:59 nsrmmd #1625: Start nsrmmd #1625, with PID 12389, at HOST oberon2

oberon2 12 july
07/12/07 01:35:59 nsrmmd #386: Start nsrmmd #386, with PID 21905, at HOST oberon2
07/12/07 06:00:26 nsrmmd #411: Start nsrmmd #411, with PID 27508, at HOST oberon2

I have registered a service request with EMC Support and gave them a truck load of information, but I don't seem to be getting anywhere. The only thing that I can think of is that because al these time outs occur during the night, is that the backup server is to busy and that RMAN times out because of this. When the database admins start the backup the next day during the day there is never a problem. Also when a time out occurs the connection is reset. After that the other RMAN backups that are in the queue finish without any problems.

21 Posts

July 12th, 2007 07:00

I've seen this happen before and it was a resource problem, thus issues with no sockets. It can also be a memory issue. I'll see if I can dig up anything and post it back here.

Thanks,
Dennis

26 Posts

July 18th, 2007 04:00

We found this environment variable NSR_RPC_TIMEOUT. This was used in an SAP-environment. I was wondering if I could use it as well to solve the RPC timeout problem. However I have no further information on this variable and I can't find anything on the internet or in the Powerlink Knowledge Base as well.

1. Can we use this environment variable NSR_RPC_TIMEOUT?
2. To what value do we set this variable?

14.3K Posts

July 18th, 2007 05:00

1. Can we use this environment variable NSR_RPC_TIMEOUT?

I use that with SAP while doing PS backups for one db, but I had to get patched backint to be able to use it. I do not think nature of the problem that made us to use is involved to what you have.

2. To what value do we set this variable?

The one you need (in minutes).
No Events found!

Top