Highlighted
BrianUofM
1 Nickel

Oracle RMAN backup unable to use Isilon SmartConnect name

Hello.

We have an Oracle RMAN cluster with 4 servers that we are trying to point to Isilon using NFS.

The backups works fine when the Oracle cluster mounts the Isilon NFS share using an IP address.

When we use the Isilon NFS zone SmartConnect name we get an error each time:


ORA-17516: dNFS asynchronous I/O failure

Cause: The asynchronous I/O request failed due to storage server reboot.

Action: Make sure the storage server does not reboot repeatedly during database operations.

The Isilon is not rebooting at the time of backup.


We have a current SR open on this but are not making much traction.  We have gathered multiple pcaps using tcpdump while the error happens for EMC support to analyze.  Nothing yet.


Isilon is v8.0.0.4

Oracle 12.1.0.2.0


Any thoughts or personal experiences with something like this would be great!


Thanks!

Labels (3)
0 Kudos
16 Replies
sjones51
2 Iron

Re: Oracle RMAN backup unable to use Isilon SmartConnect name

Hi brichtab,

I will start by saying I have zero experience with Oracle RMAN. I have however seen applications that don't like the fact that they may get a different IP address throughout its respective process. It may have nothing to do with that, but would be easy enough to test if you could limit the pool IP range to only one IP address. If you still get errors, then that is not the issue. However, if you don't get any errors it could indicate that is the issue.

BrianUofM
1 Nickel

Re: Oracle RMAN backup unable to use Isilon SmartConnect name

sjones5, thanks for your feedback!

We could create a test SmartConnect name, assign 1 IP and see what happens.

We did a similar test, where we have each of the Oracle servers mount a different IP (each from a different Isilon node) and that was successful.  Backup jobs ran without issue.

0 Kudos
dynamox
6 Gallium

Re: Oracle RMAN backup unable to use Isilon SmartConnect name

have you tried not using dNFS and just mount NFS in the operating system ?

0 Kudos
BrianUofM
1 Nickel

Re: Oracle RMAN backup unable to use Isilon SmartConnect name

It seems we are not using dNFS right now, so this is all just kernel NFS.

0 Kudos
tyfoid_kid
1 Nickel

Re: Oracle RMAN backup unable to use Isilon SmartConnect name

We're having this issue now too.  We've been 8.0.0.4 since March and the Oracle hosts have been on 12.1.0.2.0 for two years.  WTH

0 Kudos
BrianUofM
1 Nickel

Re: Oracle RMAN backup unable to use Isilon SmartConnect name

Sorry to hear that GoldyGopher!   Any progress figuring out what happened?

Update on mine (nothing too exciting):

Seems I was confused and this is not RMAN but something called a DataPump.

The client servers are actually an Oracle Exadata.

EMC has come back with nothing from the previous packet captures I have sent them.   They have asked for more specific data and sent along what they want.  I will post that here in a effort that it will help some other poor soul in the future that is having this issue.   Seems that getting the *right* data to the vendor is half the battle!  haha

0 Kudos
BrianUofM
1 Nickel

Re: Oracle RMAN backup unable to use Isilon SmartConnect name

##########################################################################################################

# --> Setup/Data Gathering

##########################################################################################################

1. On the client machine: Get environmental details

mkdir -p /tmp/$(date +%m%d%Y)/pcaps /tmp/$(date +%m%d%Y)/logs

ifconfig > /tmp/$(date +%m%d%Y)/logs/ifconfig.out

mount -v > /tmp/$(date +%m%d%Y)/logs/mount_v.out

uname -a > /tmp/$(date +%m%d%Y)/logs/uname.out

if [ -f /etc/os-release ]; then

    # freedesktop.org and systemd

    . /etc/os-release

    OS=$NAME

    VER=$VERSION_ID

elif type lsb_release >/dev/null 2>&1; then

    # linuxbase.org

    OS=$(lsb_release -si)

    VER=$(lsb_release -sr)

elif [ -f /etc/lsb-release ]; then

    # For some versions of Debian/Ubuntu without lsb_release command

    . /etc/lsb-release

    OS=$DISTRIB_ID

    VER=$DISTRIB_RELEASE

elif [ -f /etc/debian_version ]; then

    # Older Debian/Ubuntu/etc.

    OS=Debian

    VER=$(cat /etc/debian_version)

elif [ -f /etc/release ]; then

    OS=$(uname -s)

    VER=$(cat /etc/release)

else

    # Fall back to uname, e.g. "Linux <version>", also works for BSD, etc.

    OS=$(uname -s)

    VER=$(uname -r)

fi

echo "$OS | $VER" >> /tmp/$(date +%m%d%Y)/logs/nixVersion.out

2. On the Isilon: Make the following directories

mkdir -p /ifs/data/Isilon_Support/$(date +%m%d%Y)/pcaps /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs

3. On the Isilon: Gather starting data

date >> /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/duration.txt ;

isi_for_array 'sysctl kern.proc.all_stacks > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/debugStartSysctlKernProcAllStacks.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'ps alxwH | egrep "COMMAND|lw|rpc" > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/debugStartPs_alxwH.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'isi auth status > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartIsiAuthStatus.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array -s isi_stats_dcinfo -d 15 > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartIsi_stats_dcinfo.$(date +%m%d%Y_%H%M.%S).`hostname`.txt ;

isi_for_array 'sysctl efs.idmap.stats > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartSysctlEfsIdmapStats.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'netstat -L > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartNetstat-l.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'netstat -rn > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartNetstat-ra.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'netstat -an > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartNetstat-na.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array '/usr/likewise/bin/lwsm list > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartLwsmList.$(date +%m%d%Y_%H%M.%S).`hostname`.txt'

#################################################################################################################################################################

# --> Capturing Packets

#################################################################################################################################################################

4. On the Isilon: Start a packet capture on all node external interfaces looking for the client IP address and excluding SSH. This will put the captures into /ifs/data/Isilon_Support/$(date +%m%d%Y)/pcaps/ and

information on packets captured/dropped into /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/ (Be sure to replace client="192.168.190.6" with the IP address of the client in question)

isi_for_array -X 'client="192.168.190.6"; for int in $(tcpdump -D | cut -d"." -f2 | egrep -v "ib0|ib1|lo0|vlan|lagg"); do tcpdump -W 3 -C 200 -w /ifs/data/Isilon_Support/$(date +%m%d%Y)/pcaps/$(uname -n).${int}.pcap -s0 -ni $int host $client and not port 22 &> /ifs/data/Isilon_Support/$(date +%m%d%Y)/pcaps/$(uname -n).${int}.log & ; done'

5. On the Client: Start a packet capture excluding SSH (This process will only work on Unix type OSes, customer’s will need to generate their own command set for other OSes)

for int in $(ls /sys/class/net | grep -v "lo"); do tcpdump -W 3 -C 200 -w /tmp/$(date +%m%d%Y)/pcaps/$(uname -n).${int}.pcap -s0 -ni $int not port 22 &> /tmp/$(date +%m%d%Y)/logs/$(uname -n).${int}.pcap.log & done

#################################################################################################################################################################

# --> Issue Reproduction

#################################################################################################################################################################

6. On the Client: reproduce the issue. If the issue is 100% reproducible on all clients, perform an unmount and remount first. PLEASE SAVE THIS OUTPUT. (The following is an example of that that might look like, the

client should do whatever they were having an issue with)

client# date

Fri Sep 22 12:05:40 PDT 2017

client# umount /mnt/isi

client# mount isilonName:/ifs/ /mnt/isi

client# time ls -l /mnt/isi

total 9

-rw-rw-r-- 1 root wheel 6 Sep 19 12:12 testfile

real    0m0.031s

user    0m0.000s

sys     0m0.004s

#################################################################################################################################################################

# --> Cleanup

#################################################################################################################################################################

7. On the Client:: Stop packet captures

for i in $(pgrep tcpdump); do kill -INT $i; done

8. On the Isilon: Stop packet captures

isi_for_array -X 'for i in $(pgrep tcpdump); do kill -INT $i; done'

9. Confirm that all tcpdumps are completed *(All nodes should return sa8010-1 exited with status 1)

isi_for_array -X 'ps auxw | grep -E "tcpdump" | grep -v grep'

################################################################################################################################################################

# --> Final Information Gather and Bundling

################################################################################################################################################################

10. On the Isilon: gcore the processes a second time

date >> /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/duration.txt ;

isi_for_array 'sysctl kern.proc.all_stacks > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/debugEndSysctlKernProcAllStacks.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'ps alxwH | egrep "COMMAND|lw|rpc" > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/debugEndPs_alxwH.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'isi auth status > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndIsiAuthStatus.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array -s isi_stats_dcinfo -d 15 > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndIsi_stats_dcinfo.$(date +%m%d%Y_%H%M.%S).`hostname`.txt ;

isi_for_array 'sysctl efs.idmap.stats > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndSysctlEfsIdmapStats.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'netstat -L > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndNetstat-l.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'netstat -an > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndNetstat-na.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'netstat -rn > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndNetstat-nr.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array '/usr/likewise/bin/lwsm list > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndLwsmList.$(date +%m%d%Y_%H%M.%S).`hostname`.txt'

11. On the Client: Bundle the log files

cd /tmp

tar -zcvf $(date +%m%d%Y).client.tgz $(date +%m%d%Y)/ && rm -rf $(date +%m%d%Y)

12. On the Isilon: Bundle the log files

cd /ifs/data/Isilon_Support/

tar -zcvf $(date +%m%d%Y).cluster.tgz $(date +%m%d%Y)/ && rm -rf $(date +%m%d%Y)

13. On the Isilon: Upload the bundle and perform a full gather

isi_gather_info -f /ifs/data/Isilon_Support/$(date +%m%d%Y).tgz

14. Provide requested workflow information as well in the Service Request Summary Notes

    A) Client IP address used for testing:

    C) Username, exportname, and file/path used for testing:

    D) Note the process restart that resolved the issue, if any:

    E) isi auth mapping token <user> --zone=<zone> for user used in testing :

    F) What is this cluster used for:

       For example: research, home directory, file storage, hosting VMs, citrix, PACS, Video.

    G) Are there any specific apps connecting to the cluster:

    J) The number of clients that connect to the cluster:

    K) Do most clients connect via smartconnect name or IP:

    L) Does the customer have Active Directory Auth?

0 Kudos
BrianUofM
1 Nickel

Re: Oracle RMAN backup unable to use Isilon SmartConnect name

So one thing I cannot get past, and maybe someone knows more about NFS on Isilon...

When you mount an Isilon NFS share on a Linux client server, whether using an IP or DNS name (smartconnect name) the client server makes a mount request to the Isilon, if using a DNS name the DNS server will resolve that name to an IP, the Isilon will respond by looking for the client IP in the "Clients, Root Clients, Read Only Clients or Read Write Clients" fields of the NFS export configured on the Isilon.

If the Isilon finds the client IP in that list it will allow the mount to a mountpoint on the Linux client server.  In this case /orabackup.

Thats it.  From then on, the Linux client and it's application, script, datapump, whatever will just see it as a "local" filesystem.   There wont be any more calls to mount or check access.

So the fact that mounting with either IP or DNS (smartconnect) name works fine, but when the datapump runs using the smartconnect name it fails somewhere in the middle (not right away) and when mounting using the IP the datapump works fine each time with no errors.

Am I missing something?

0 Kudos
tyfoid_kid
1 Nickel

Re: Oracle RMAN backup unable to use Isilon SmartConnect name

So what our problem turned out to be was the 'export' script was running as

SYS instead of root. Once they changed the credentials it worked fine.

We've been at the mentioned version of Oracle for 2 years and we've been on

8.0.0.4 since March so I guess that was the difference for our problem.

On Wed, Oct 4, 2017 at 8:35 AM brichtab <emc-community-network@emc.com>