Unsolved

This post is more than 5 years old

1 Rookie

 • 

15 Posts

4214

February 1st, 2007 06:00

Full Oracle HotBackups fail..

Hello all,

We have a Networker 7.2.2 Server running on Win2003SP1. It has been backing up Oracle 10g (running on Solaris 10) with NMO 4.2 quite well for the last 3 months.. And then 3 weeks ago, full backups fail with the below message:

RMAN-03002: failure of backup plus archivelog command at 01/25/2007 16:46:57
ORA-27192: skgfcls: sbtclose2 returned error - failed to close file
ORA-19511: Error received from media manager layer, error text:
The function nsr_end() failed with the error message: RPC receive operation fai
led. A network connection could not be established with the host. (0:5:0)

Incremental NMO backups go fine.. The above problem is clearly stated as "The remote procedure call (RPC) services are unable to communicate with the
server software. Communication between the server and the client was lost
while the server was processing the request." in Error Messages Guide of Networker.
I've been suspicious about patch levels of Solaris client and/or Windows Server; mainly because Solaris patch level is 7-8 months old and some buggy reports about TCP comm. has been revealed recently.

Any more ideas are greatly appreciated..
Best regards,
Atatur

6 Operator

 • 

14.4K Posts

 • 

56.2K Points

February 1st, 2007 11:00

It seems to indicate network trouble. You could sniff network with snoop on Solaris end just to verify if network connection gets lost or unexpectedly closed.

32 Posts

February 2nd, 2007 01:00

hi;

did you check the speed of network cards ? you may check whether they have the same speed, full dublex or not. perhaps a switch reset solve the problem.

1 Rookie

 • 

15 Posts

February 2nd, 2007 01:00

Thanks for your response. I'll be doing that kind of troubleshooting in the following days..

1 Rookie

 • 

15 Posts

February 2nd, 2007 01:00

I already checked them all.. The strange thing is that full backups went fine for a period of 3 months, that would make the switch not guilty as well, at least I believe; but I'll also check the switch..
Thanks a lot my Turkish friend..

74 Posts

February 4th, 2007 11:00

Atatur,

since you state that incremental backups succeed, can you check the differences between both backup scripts, perhaps someone changed either script causing full backup to fail now.

Does the full backup send any data at all or does this error occure immediately when starting the backup?

Xander

1 Rookie

 • 

15 Posts

February 6th, 2007 01:00

Xander,

Sorry for late response. I checked the scripts but they're ok.. Here is the last situation:
I changed the full backup RMAN script a bit (now the filesperset is 3, previously it was 10). The old RPC network error disappear, full backup go fine and they appear to FINISH but in the end Networker GUI (In Configure Tab, Groups Oracle), Oracle Group Status Watch goes in an endless loop and after 1-2 hours of inactivity Status is set to fail. During that inactivity, nothing is logged...

I want to lay down some exerpts from /nsr/applogs directory from client:

Here is "tail -f msglog":

released channel: c11
released channel: c12
released channel: c21
released channel: c22
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of release command at 02/05/2007 18:12:33
RMAN-06004: ORACLE error from recovery catalog database: ORA-03114: not connected to ORACLE

Recovery Manager complete.
ORACLE error from recovery catalog database: ORA-03114: not connected to ORACLE

I also set NSR_DEBUG_FILE and NSR_DEBUG_LEVEL to 1, and
here is tail -f debug.log output:

(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Entering.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_ss_t_info: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_vol_t_info: Entering.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Entering.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Entering.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Entering.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Entering.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_vol_t_info: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_do_info2: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) name = YSK_LEVEL0_0ci9ahs2_1_1_YSKIN.2060_1
(pid = 22931) (date = 02/05/07 18:12:16) method = STREAM
(pid = 22931) (date = 02/05/07 18:12:16) cretime = 1170690146
(pid = 22931) (date = 02/05/07 18:12:16) exptime = 1178638318
(pid = 22931) (date = 02/05/07 18:12:16) label = YSK016L3
(pid = 22931) (date = 02/05/07 18:12:16) label = YSK017L3
(pid = 22931) (date = 02/05/07 18:12:16) share = SINGLE
(pid = 22931) (date = 02/05/07 18:12:16) order = SEQUENTIAL
(pid = 22931) (date = 02/05/07 18:12:16) Total # of info fields = 9
(pid = 22931) (date = 02/05/07 18:12:16) Leaving sbtinfo2 (0)
(pid = 22931) (date = 02/05/07 18:12:33) Entering sbtend()
(pid = 22931) (date = 02/05/07 18:12:33) nwora_sbt_free_sbtinfo2_mem: Entering.
(pid = 22931) (date = 02/05/07 18:12:33) nwora_sbt_free_sbtinfo2_mem: Exiting.
(pid = 22940) (date = 02/05/07 18:12:33) Entering sbtend()
(pid = 22940) (date = 02/05/07 18:12:33) nwora_sbt_free_sbtinfo2_mem: Entering.
(pid = 22940) (date = 02/05/07 18:12:33) nwora_sbt_free_sbtinfo2_mem: Exiting.

Exiting with return code: -2


From all these, I understand that NMO write to tape devices without errors, but something wrong at the end of happened and that make backup fail. What does Networker do at the end of this process, does it try to synchronize the Networker index and RMAN catalog? Is there a way for Networker to log more about what is doing in the end like setting an Environment Variable? I searched the Docs, but I couldn't find..

Thanks in advance,
Atatur

76 Posts

February 16th, 2007 10:00

Atatur,

your rman script shutdown the database ?

you have the recovery catalog in the same database ?

if you change your script to send backup to disk, the script work without error ?

Anderson

1 Rookie

 • 

15 Posts

February 19th, 2007 00:00

Hi everyone and Anderson,

Sorry for late response. In the end, the problem turns out to be a network connectivity problem.. A firewall on the way (which I really didn't take into account, because of the successfull backups in the past) hasblocked the session between the NMO client and the catalog server after 1 hour of inactivity. In the past, full backups took a period less than a hour but a database extension made the backup bigger than previous..

It really made me sick because this firewall behaviour is totally unacceptable. I talked (actually argued) the network guys because Tcp Keepalive parameter they mentioned is nothing to do with an "inactive session". Inactive session, on the contrary to their misconception, does not mean a session which one end doesn't send data for a long time; this is the type of communication between NMO client(oracle) and catalog server. Initially, NMO client contacts with catalog server but once the backup begins, there is no data transfer in-between until backups finishes and hence the firewall interprets this as an "inactive session".. Sad, really sad, because I have dealt with this for 3 weeks.

Best regards,
Atatur YILDIRIM

1 Message

June 20th, 2007 06:00

We have the same problem and what we are looking at is to recreate the link to libobk, which is what Oracle MetaNet says to do. I will let you know if this helps out.

0 events found

No Events found!

Top