Start a Conversation

Unsolved

This post is more than 5 years old

5123

September 19th, 2017 07:00

Oracle RMAN backup unable to use Isilon SmartConnect name

Hello.

We have an Oracle RMAN cluster with 4 servers that we are trying to point to Isilon using NFS.

The backups works fine when the Oracle cluster mounts the Isilon NFS share using an IP address.

When we use the Isilon NFS zone SmartConnect name we get an error each time:


ORA-17516: dNFS asynchronous I/O failure

Cause: The asynchronous I/O request failed due to storage server reboot.

Action: Make sure the storage server does not reboot repeatedly during database operations.

The Isilon is not rebooting at the time of backup.


We have a current SR open on this but are not making much traction.  We have gathered multiple pcaps using tcpdump while the error happens for EMC support to analyze.  Nothing yet.


Isilon is v8.0.0.4

Oracle 12.1.0.2.0


Any thoughts or personal experiences with something like this would be great!


Thanks!

252 Posts

September 19th, 2017 10:00

Hi brichtab,

I will start by saying I have zero experience with Oracle RMAN. I have however seen applications that don't like the fact that they may get a different IP address throughout its respective process. It may have nothing to do with that, but would be easy enough to test if you could limit the pool IP range to only one IP address. If you still get errors, then that is not the issue. However, if you don't get any errors it could indicate that is the issue.

28 Posts

September 19th, 2017 10:00

sjones5, thanks for your feedback!

We could create a test SmartConnect name, assign 1 IP and see what happens.

We did a similar test, where we have each of the Oracle servers mount a different IP (each from a different Isilon node) and that was successful.  Backup jobs ran without issue.

1 Rookie

 • 

20.4K Posts

September 19th, 2017 19:00

have you tried not using dNFS and just mount NFS in the operating system ?

28 Posts

September 20th, 2017 07:00

It seems we are not using dNFS right now, so this is all just kernel NFS.

55 Posts

September 24th, 2017 05:00

We're having this issue now too.  We've been 8.0.0.4 since March and the Oracle hosts have been on 12.1.0.2.0 for two years.  WTH

28 Posts

October 3rd, 2017 11:00

##########################################################################################################

# --> Setup/Data Gathering

##########################################################################################################

1. On the client machine: Get environmental details

mkdir -p /tmp/$(date +%m%d%Y)/pcaps /tmp/$(date +%m%d%Y)/logs

ifconfig > /tmp/$(date +%m%d%Y)/logs/ifconfig.out

mount -v > /tmp/$(date +%m%d%Y)/logs/mount_v.out

uname -a > /tmp/$(date +%m%d%Y)/logs/uname.out

if [ -f /etc/os-release ]; then

    # freedesktop.org and systemd

    . /etc/os-release

    OS=$NAME

    VER=$VERSION_ID

elif type lsb_release >/dev/null 2>&1; then

    # linuxbase.org

    OS=$(lsb_release -si)

    VER=$(lsb_release -sr)

elif [ -f /etc/lsb-release ]; then

    # For some versions of Debian/Ubuntu without lsb_release command

    . /etc/lsb-release

    OS=$DISTRIB_ID

    VER=$DISTRIB_RELEASE

elif [ -f /etc/debian_version ]; then

    # Older Debian/Ubuntu/etc.

    OS=Debian

    VER=$(cat /etc/debian_version)

elif [ -f /etc/release ]; then

    OS=$(uname -s)

    VER=$(cat /etc/release)

else

    # Fall back to uname, e.g. "Linux ", also works for BSD, etc.

    OS=$(uname -s)

    VER=$(uname -r)

fi

echo "$OS | $VER" >> /tmp/$(date +%m%d%Y)/logs/nixVersion.out

2. On the Isilon: Make the following directories

mkdir -p /ifs/data/Isilon_Support/$(date +%m%d%Y)/pcaps /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs

3. On the Isilon: Gather starting data

date >> /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/duration.txt ;

isi_for_array 'sysctl kern.proc.all_stacks > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/debugStartSysctlKernProcAllStacks.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'ps alxwH | egrep "COMMAND|lw|rpc" > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/debugStartPs_alxwH.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'isi auth status > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartIsiAuthStatus.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array -s isi_stats_dcinfo -d 15 > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartIsi_stats_dcinfo.$(date +%m%d%Y_%H%M.%S).`hostname`.txt ;

isi_for_array 'sysctl efs.idmap.stats > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartSysctlEfsIdmapStats.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'netstat -L > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartNetstat-l.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'netstat -rn > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartNetstat-ra.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'netstat -an > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartNetstat-na.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array '/usr/likewise/bin/lwsm list > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoStartLwsmList.$(date +%m%d%Y_%H%M.%S).`hostname`.txt'

#################################################################################################################################################################

# --> Capturing Packets

#################################################################################################################################################################

4. On the Isilon: Start a packet capture on all node external interfaces looking for the client IP address and excluding SSH. This will put the captures into /ifs/data/Isilon_Support/$(date +%m%d%Y)/pcaps/ and

information on packets captured/dropped into /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/ (Be sure to replace client="192.168.190.6" with the IP address of the client in question)

isi_for_array -X 'client="192.168.190.6"; for int in $(tcpdump -D | cut -d"." -f2 | egrep -v "ib0|ib1|lo0|vlan|lagg"); do tcpdump -W 3 -C 200 -w /ifs/data/Isilon_Support/$(date +%m%d%Y)/pcaps/$(uname -n).${int}.pcap -s0 -ni $int host $client and not port 22 &> /ifs/data/Isilon_Support/$(date +%m%d%Y)/pcaps/$(uname -n).${int}.log & ; done'

5. On the Client: Start a packet capture excluding SSH (This process will only work on Unix type OSes, customer’s will need to generate their own command set for other OSes)

for int in $(ls /sys/class/net | grep -v "lo"); do tcpdump -W 3 -C 200 -w /tmp/$(date +%m%d%Y)/pcaps/$(uname -n).${int}.pcap -s0 -ni $int not port 22 &> /tmp/$(date +%m%d%Y)/logs/$(uname -n).${int}.pcap.log & done

#################################################################################################################################################################

# --> Issue Reproduction

#################################################################################################################################################################

6. On the Client: reproduce the issue. If the issue is 100% reproducible on all clients, perform an unmount and remount first. PLEASE SAVE THIS OUTPUT. (The following is an example of that that might look like, the

client should do whatever they were having an issue with)

client# date

Fri Sep 22 12:05:40 PDT 2017

client# umount /mnt/isi

client# mount isilonName:/ifs/ /mnt/isi

client# time ls -l /mnt/isi

total 9

-rw-rw-r-- 1 root wheel 6 Sep 19 12:12 testfile

real    0m0.031s

user    0m0.000s

sys     0m0.004s

#################################################################################################################################################################

# --> Cleanup

#################################################################################################################################################################

7. On the Client:: Stop packet captures

for i in $(pgrep tcpdump); do kill -INT $i; done

8. On the Isilon: Stop packet captures

isi_for_array -X 'for i in $(pgrep tcpdump); do kill -INT $i; done'

9. Confirm that all tcpdumps are completed *(All nodes should return sa8010-1 exited with status 1)

isi_for_array -X 'ps auxw | grep -E "tcpdump" | grep -v grep'

################################################################################################################################################################

# --> Final Information Gather and Bundling

################################################################################################################################################################

10. On the Isilon: gcore the processes a second time

date >> /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/duration.txt ;

isi_for_array 'sysctl kern.proc.all_stacks > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/debugEndSysctlKernProcAllStacks.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'ps alxwH | egrep "COMMAND|lw|rpc" > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/debugEndPs_alxwH.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'isi auth status > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndIsiAuthStatus.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array -s isi_stats_dcinfo -d 15 > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndIsi_stats_dcinfo.$(date +%m%d%Y_%H%M.%S).`hostname`.txt ;

isi_for_array 'sysctl efs.idmap.stats > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndSysctlEfsIdmapStats.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'netstat -L > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndNetstat-l.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'netstat -an > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndNetstat-na.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array 'netstat -rn > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndNetstat-nr.$(date +%m%d%Y_%H%M.%S).`hostname`.txt' ;

isi_for_array '/usr/likewise/bin/lwsm list > /ifs/data/Isilon_Support/$(date +%m%d%Y)/logs/infoEndLwsmList.$(date +%m%d%Y_%H%M.%S).`hostname`.txt'

11. On the Client: Bundle the log files

cd /tmp

tar -zcvf $(date +%m%d%Y).client.tgz $(date +%m%d%Y)/ && rm -rf $(date +%m%d%Y)

12. On the Isilon: Bundle the log files

cd /ifs/data/Isilon_Support/

tar -zcvf $(date +%m%d%Y).cluster.tgz $(date +%m%d%Y)/ && rm -rf $(date +%m%d%Y)

13. On the Isilon: Upload the bundle and perform a full gather

isi_gather_info -f /ifs/data/Isilon_Support/$(date +%m%d%Y).tgz

14. Provide requested workflow information as well in the Service Request Summary Notes

    A) Client IP address used for testing:

    C) Username, exportname, and file/path used for testing:

    D) Note the process restart that resolved the issue, if any:

    E) isi auth mapping token --zone= for user used in testing :

    F) What is this cluster used for:

       For example: research, home directory, file storage, hosting VMs, citrix, PACS, Video.

    G) Are there any specific apps connecting to the cluster:

    J) The number of clients that connect to the cluster:

    K) Do most clients connect via smartconnect name or IP:

    L) Does the customer have Active Directory Auth?

28 Posts

October 3rd, 2017 11:00

Sorry to hear that GoldyGopher!   Any progress figuring out what happened?

Update on mine (nothing too exciting):

Seems I was confused and this is not RMAN but something called a DataPump.

The client servers are actually an Oracle Exadata.

EMC has come back with nothing from the previous packet captures I have sent them.   They have asked for more specific data and sent along what they want.  I will post that here in a effort that it will help some other poor soul in the future that is having this issue.   Seems that getting the *right* data to the vendor is half the battle!  haha

28 Posts

October 4th, 2017 06:00

So one thing I cannot get past, and maybe someone knows more about NFS on Isilon...

When you mount an Isilon NFS share on a Linux client server, whether using an IP or DNS name (smartconnect name) the client server makes a mount request to the Isilon, if using a DNS name the DNS server will resolve that name to an IP, the Isilon will respond by looking for the client IP in the "Clients, Root Clients, Read Only Clients or Read Write Clients" fields of the NFS export configured on the Isilon.

If the Isilon finds the client IP in that list it will allow the mount to a mountpoint on the Linux client server.  In this case /orabackup.

Thats it.  From then on, the Linux client and it's application, script, datapump, whatever will just see it as a "local" filesystem.   There wont be any more calls to mount or check access.

So the fact that mounting with either IP or DNS (smartconnect) name works fine, but when the datapump runs using the smartconnect name it fails somewhere in the middle (not right away) and when mounting using the IP the datapump works fine each time with no errors.

Am I missing something?

55 Posts

October 4th, 2017 08:00

So what our problem turned out to be was the 'export' script was running as

SYS instead of root. Once they changed the credentials it worked fine.

We've been at the mentioned version of Oracle for 2 years and we've been on

8.0.0.4 since March so I guess that was the difference for our problem.

On Wed, Oct 4, 2017 at 8:35 AM brichtab

28 Posts

October 6th, 2017 09:00

Glad to hear you found the issue.

Just to clarify you guys are using the Isilon SmartConnect name to mount the NFS share?

Thanks!

55 Posts

October 9th, 2017 04:00

Yes.

28 Posts

October 18th, 2017 06:00

GoldyGopher,

Would you be willing to share your client NFS mount options?

This is what we are using (or at least this is what the DBA sent me from the mount command):

:/ on / type nfs (rw,bg,hard,nointr,rsize=32768,wsize=32768,tcp,actimeo=0,vers=3,timeo=600,addr= )

Additional feedback from tests run by the DBA still prove this is a very strange issue:

A metadata export with no data gives error and data export of one or two tables don't give error and certain databases always give the error and certain don't. But everything works fine when when we use ipaddress to mount the NFS share from Isilon.

55 Posts

October 19th, 2017 07:00

Sure.

This is what they were told are the correct mount options from Oracle.


smartconnect.dns.name:/exported-directory  /mountpoint rw,bg,hard,nointr,rsize=32768, wsize=32768,tcp,noac,vers=3,timeo=600,actimeo=0


The only difference I see is calling out the type of mount for the OS on yours but who knows. There was some discussion around the nointr parameter since most modern linux distros ignore that (as well as the rsize an wsize.) NFS mount options can be very frustrating to standardize on since what works for some can sometimes not work for others.  They wanted to use this since this is what Oracle had told them and until this little issue this had worked for years. One of the DBA's also put this out there in our hipchat room regarding these settings.


For RMAN backup sets, image copies, and Data Pump dump files, the "NOAC" mount option should not be specified - that is because RMAN and Data Pump do not check this option and specifying this can adversely affect performance.

28 Posts

October 23rd, 2017 07:00

Thanks Goldy!

We had a breakthrough, which I am embarrassed to admit I did not clue in on earlier...oh well.   Anyway...

All the tests we have been running so far have been from one datacenter to another.  Meaning Oracle Exadata client servers have been located in DC1 and our Prod Isilon located in DC2.   We ran another test late last week this time using our DR Isilon located in DC1 (same DC as the Oracle Exadata client servers).  We mounted the NFS share using the smartconnect name and the datapump export worked perfectly.  

So it seems that the issue is somewhere in the network path between datacenters.  Again, when using IP to mount, works fine every time.  Using smartconnect name to mount, it will start to work, then fail.

I am not a networking guy so no idea what is causing the problem but hopefully our network team can figure it out.

28 Posts

November 3rd, 2017 10:00

Found the issue.

ASA Firewall needed to allow Isilon network inbound on port 2049 on the Oracle Exadata network.


ASA SYSLOG messages

%ASA-6-106015: Deny TCP (no connection) from x.x.x.x/x to x.x.x.x/x flags RST on interface someinterface


The security appliance discarded a TCP packet that has no associated connection in the security appliance connection table.

The security appliance looks for a SYN flag in the packet, which indicates a request to establish a new connection.

If the SYN flag is not set, and there is not an existing connection, the security appliance discards the packet.

Recommended Action:  None required unless the security appliance receives a large volume of these invalid TCP packets.

If this is the case, trace the packets to the source and determine the reason these packets were sent.

No Events found!

Top