Unsolved
This post is more than 5 years old
1 Rookie
•
15 Posts
0
4214
February 1st, 2007 06:00
Full Oracle HotBackups fail..
Hello all,
We have a Networker 7.2.2 Server running on Win2003SP1. It has been backing up Oracle 10g (running on Solaris 10) with NMO 4.2 quite well for the last 3 months.. And then 3 weeks ago, full backups fail with the below message:
RMAN-03002: failure of backup plus archivelog command at 01/25/2007 16:46:57
ORA-27192: skgfcls: sbtclose2 returned error - failed to close file
ORA-19511: Error received from media manager layer, error text:
The function nsr_end() failed with the error message: RPC receive operation fai
led. A network connection could not be established with the host. (0:5:0)
Incremental NMO backups go fine.. The above problem is clearly stated as "The remote procedure call (RPC) services are unable to communicate with the
server software. Communication between the server and the client was lost
while the server was processing the request." in Error Messages Guide of Networker.
I've been suspicious about patch levels of Solaris client and/or Windows Server; mainly because Solaris patch level is 7-8 months old and some buggy reports about TCP comm. has been revealed recently.
Any more ideas are greatly appreciated..
Best regards,
Atatur
We have a Networker 7.2.2 Server running on Win2003SP1. It has been backing up Oracle 10g (running on Solaris 10) with NMO 4.2 quite well for the last 3 months.. And then 3 weeks ago, full backups fail with the below message:
RMAN-03002: failure of backup plus archivelog command at 01/25/2007 16:46:57
ORA-27192: skgfcls: sbtclose2 returned error - failed to close file
ORA-19511: Error received from media manager layer, error text:
The function nsr_end() failed with the error message: RPC receive operation fai
led. A network connection could not be established with the host. (0:5:0)
Incremental NMO backups go fine.. The above problem is clearly stated as "The remote procedure call (RPC) services are unable to communicate with the
server software. Communication between the server and the client was lost
while the server was processing the request." in Error Messages Guide of Networker.
I've been suspicious about patch levels of Solaris client and/or Windows Server; mainly because Solaris patch level is 7-8 months old and some buggy reports about TCP comm. has been revealed recently.
Any more ideas are greatly appreciated..
Best regards,
Atatur
0 events found
No Events found!


ble1
6 Operator
•
14.4K Posts
•
56.2K Points
1
February 1st, 2007 11:00
kenanerdey
32 Posts
0
February 2nd, 2007 01:00
did you check the speed of network cards ? you may check whether they have the same speed, full dublex or not. perhaps a switch reset solve the problem.
atatury
1 Rookie
•
15 Posts
0
February 2nd, 2007 01:00
atatury
1 Rookie
•
15 Posts
0
February 2nd, 2007 01:00
Thanks a lot my Turkish friend..
xvan_egmond
74 Posts
0
February 4th, 2007 11:00
since you state that incremental backups succeed, can you check the differences between both backup scripts, perhaps someone changed either script causing full backup to fail now.
Does the full backup send any data at all or does this error occure immediately when starting the backup?
Xander
atatury
1 Rookie
•
15 Posts
0
February 6th, 2007 01:00
Sorry for late response. I checked the scripts but they're ok.. Here is the last situation:
I changed the full backup RMAN script a bit (now the filesperset is 3, previously it was 10). The old RPC network error disappear, full backup go fine and they appear to FINISH but in the end Networker GUI (In Configure Tab, Groups Oracle), Oracle Group Status Watch goes in an endless loop and after 1-2 hours of inactivity Status is set to fail. During that inactivity, nothing is logged...
I want to lay down some exerpts from /nsr/applogs directory from client:
Here is "tail -f msglog":
released channel: c11
released channel: c12
released channel: c21
released channel: c22
RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of release command at 02/05/2007 18:12:33
RMAN-06004: ORACLE error from recovery catalog database: ORA-03114: not connected to ORACLE
Recovery Manager complete.
ORACLE error from recovery catalog database: ORA-03114: not connected to ORACLE
I also set NSR_DEBUG_FILE and NSR_DEBUG_LEVEL to 1, and
here is tail -f debug.log output:
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Entering.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_ss_t_info: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_vol_t_info: Entering.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Entering.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Entering.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Entering.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Entering.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_add_sbtbfinfo_list_elem: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_vol_t_info: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) nwora_sbt_do_info2: Exiting.
(pid = 22931) (date = 02/05/07 18:12:16) name = YSK_LEVEL0_0ci9ahs2_1_1_YSKIN.2060_1
(pid = 22931) (date = 02/05/07 18:12:16) method = STREAM
(pid = 22931) (date = 02/05/07 18:12:16) cretime = 1170690146
(pid = 22931) (date = 02/05/07 18:12:16) exptime = 1178638318
(pid = 22931) (date = 02/05/07 18:12:16) label = YSK016L3
(pid = 22931) (date = 02/05/07 18:12:16) label = YSK017L3
(pid = 22931) (date = 02/05/07 18:12:16) share = SINGLE
(pid = 22931) (date = 02/05/07 18:12:16) order = SEQUENTIAL
(pid = 22931) (date = 02/05/07 18:12:16) Total # of info fields = 9
(pid = 22931) (date = 02/05/07 18:12:16) Leaving sbtinfo2 (0)
(pid = 22931) (date = 02/05/07 18:12:33) Entering sbtend()
(pid = 22931) (date = 02/05/07 18:12:33) nwora_sbt_free_sbtinfo2_mem: Entering.
(pid = 22931) (date = 02/05/07 18:12:33) nwora_sbt_free_sbtinfo2_mem: Exiting.
(pid = 22940) (date = 02/05/07 18:12:33) Entering sbtend()
(pid = 22940) (date = 02/05/07 18:12:33) nwora_sbt_free_sbtinfo2_mem: Entering.
(pid = 22940) (date = 02/05/07 18:12:33) nwora_sbt_free_sbtinfo2_mem: Exiting.
Exiting with return code: -2
From all these, I understand that NMO write to tape devices without errors, but something wrong at the end of happened and that make backup fail. What does Networker do at the end of this process, does it try to synchronize the Networker index and RMAN catalog? Is there a way for Networker to log more about what is doing in the end like setting an Environment Variable? I searched the Docs, but I couldn't find..
Thanks in advance,
Atatur
axavier1
76 Posts
0
February 16th, 2007 10:00
your rman script shutdown the database ?
you have the recovery catalog in the same database ?
if you change your script to send backup to disk, the script work without error ?
Anderson
atatury
1 Rookie
•
15 Posts
0
February 19th, 2007 00:00
Sorry for late response. In the end, the problem turns out to be a network connectivity problem.. A firewall on the way (which I really didn't take into account, because of the successfull backups in the past) hasblocked the session between the NMO client and the catalog server after 1 hour of inactivity. In the past, full backups took a period less than a hour but a database extension made the backup bigger than previous..
It really made me sick because this firewall behaviour is totally unacceptable. I talked (actually argued) the network guys because Tcp Keepalive parameter they mentioned is nothing to do with an "inactive session". Inactive session, on the contrary to their misconception, does not mean a session which one end doesn't send data for a long time; this is the type of communication between NMO client(oracle) and catalog server. Initially, NMO client contacts with catalog server but once the backup begins, there is no data transfer in-between until backups finishes and hence the firewall interprets this as an "inactive session".. Sad, really sad, because I have dealt with this for 3 weeks.
Best regards,
Atatur YILDIRIM
dal1
1 Message
0
June 20th, 2007 06:00