Unsolved
This post is more than 5 years old
18 Posts
0
2198
October 18th, 2010 11:00
Performance problems (Excessive Solaris 10 pollsys calls with nsrmmd).
Hi all,
My setup is a Sun T1000 running Solaris 10 U4 with recommended patches from 2/2010 (Generic_142900-03). Networker 7.5.2. FC connected
Overland NEO8000 with 4 LTO-4 drives. One storage node running on x4500 Solaris 10 U8 with 30TB of AFTD on ZFS. I have been struggling with
performance problems when staging from the x4500 to the tape drives on the T1000. Recently I found out the following:
# sar -c 1 10 SunOS legato.tau.ac.il 5.10 Generic_142900-03 sun4v 10/12/2010 02:15:02 scall/s sread/s swrit/s fork/s exec/s rchar/s wchar/s 02:15:03 21500 6207 187 0.00 0.00 73438304 540399 02:15:04 21160 6132 46 0.00 0.00 67140728 107171 02:15:05 17720 5203 37 0.00 0.00 45364104 56006 02:15:06 20566 6036 37 0.00 0.00 61809376 61430 02:15:07 28335 8159 110 0.00 0.00 93447008 519992 02:15:09 25388 7337 35 0.00 0.00 89129144 47190 02:15:10 27083 7837 71 0.00 0.00 91929992 141210 02:15:11 26471 7691 29 0.00 0.00 91041920 43699 02:15:12 26439 7656 47 0.00 0.00 90561864 111247 02:15:13 26514 7653 203 0.00 0.00 93589112 988914 Average 24111 6989 80 0.00 0.00 79720984 260102 With the following activity of nsrmmd:
# dtrace -n 'syscall:::entry /execname == "nsrmmd"/ {@[probefunc] = count(); } tick-5sec {printa(@); clear(@);}'
dtrace: description 'syscall:::entry ' matched 234 probes CPU ID FUNCTION:NAME 6 55495 :tick-5sec writev 4 kaio 6732 readv 21647 pollsys 24447 gtime 24451 6 55495 :tick-5sec writev 2 kaio 3588 readv 11020 pollsys 12482 gtime 12484 The output of truss -v pollsys -p 25398 goes like that: /1: readv(11, 0xFFFFFFFF7FFE3660, 2) = 57828 /1: pollsys(0x100A3A088, 3, 0xFFFFFFFF7FFFC040, 0x00000000) = 1 /1: fd=6 ev=POLLIN|POLLRDNORM rev=0 /1: fd=8 ev=POLLIN|POLLRDNORM rev=0 /1: fd=11 ev=POLLIN|POLLRDNORM rev=POLLIN|POLLRDNORM /1: timeout: 5.000000000 sec /1: readv(11, 0xFFFFFFFF7FFE3660, 2) = 71576
/1: kaio(9, 0x100CDE1D0, 0x00020000, 0x00000000, 0x4FCB6952, 0x100B715B0) = 0
/1: kaio(12, 0x100CDDFF0, 0x00000000, 0x00000002, 0x00000096, 0x0000000A) = 0
/1: time() = 1286842696 /1: pollsys(0x100A3A088, 3, 0xFFFFFFFF7FFFC040, 0x00000000) = 1 /1: fd=6 ev=POLLIN|POLLRDNORM rev=0 /1: fd=8 ev=POLLIN|POLLRDNORM rev=0 /1: fd=11 ev=POLLIN|POLLRDNORM rev=POLLIN|POLLRDNORM /1: timeout: 5.000000000 sec /1: readv(11, 0xFFFFFFFF7FFEBBA0, 2) = 59612 /1: time() = 1286842696 /1: pollsys(0x100A3A088, 3, 0xFFFFFFFF7FFFC040, 0x00000000) = 1 /1: fd=6 ev=POLLIN|POLLRDNORM rev=0 /1: fd=8 ev=POLLIN|POLLRDNORM rev=0 /1: fd=11 ev=POLLIN|POLLRDNORM rev=POLLIN|POLLRDNORM /1: timeout: 5.000000000 sec /1: readv(11, 0xFFFFFFFF7FFE3660, 2) = 72968
/1: kaio(9, 0x100CDDFF0, 0x00020000, 0x00000000, 0x4FCB6952, 0x100B715B0) = 0
/1: kaio(12, 0x100CDE090, 0x00000000, 0x00000002, 0x00000096, 0x0000000A) = 0
/1: time() = 1286842696 /1: readv(11, 0xFFFFFFFF7FFE3660, 2) = 52764 where fd 6 8 and 11 are sockets.
Is anyone aware of any Solaris/Networker bug or tuning that can help with this situation ?
-- -- Yaron.


coganb
736 Posts
0
October 29th, 2010 04:00
Hi Yaron,
I don't know of any specific bug around this but you should check out the following document for tuning and testing information.
NetWorker 7.6.1 Performance Optimization Guide:
http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-011-323.pdf?
It's got 7.6.1 written on it but it's also valid for 7.5.2 and gives details on how to tune and test the potential performance bottlenecks in your system which should help at least identify the source of the problem.
-Bobby
wlee
263 Posts
0
October 29th, 2010 04:00
Tape Marked Full Prematurely during cloning on Solaris after upgrading to NetWorker 7.5 (esg101757)
By default, NetWorker uses asynchronous I/O on Solaris 8 & 9 and uses synchronous I/O on Solaris 10.
Cause
ASYNC I/O interoperability issues with Solaris systems.
On Solaris 10 systems, ASNYC I/O is disabled on the Operating System by default. For Solaris 8 & 9 systems, ASYNC I/O is enabled by default. As a result, NetWorker environment variables will need to be set to workaround issues encountered with ASYNC I/O on Solaris 8 & and on Solaris 10 if it has been enabled.
Symptoms
Tapes marked full immediately during cloning on Solaris
Tapes marked full prematurely during cloning on Solaris servers
Error: "posix async write: Error 0"
Resolution
Once an upgrade to 7.5 on a Solaris NetWorker server has completed it is recommended that NetWorker environment variables be set prior to cloning to alleviate this issue.
Solaris 10 (with ASYNC I/O enabled on the Operating System)
1. From a command prompt edit the /etc/init.d/networker script.
2. Before the line: (echo 'starting NetWorker daemons:') > /dev/console, add the following environment variable:
SKIP_SOL10_ASYNC_FIX =YES
3. Save the file
4. Stop the NetWorker daemons:
nsr_shutdown
5. Start the NetWorker daemons:
/etc/init.d/networker start
6. Note if the NetWorker server is also the NetWorker Management Console server, the gstd daemon will also have to be restarted:
/etc/init.d/gst start
Solaris 8 & 9
1. From a command prompt edit the /etc/init.d/networker script.
2. Before the line: (echo 'starting NetWorker daemons:') > /dev/console, add the following environment variable:
DISABLE_SOL_ASYNC_IO=YES
3. Save the file.
4. Stop the NetWorker daemons:
nsr_shutdown
5. Start the NetWorker daemons:
/etc/init.d/networker start
6. Note if the NetWorker server is also the NetWorker Management Console server, the gstd daemon will also have to be restarted:
/etc/init.d/gst start
wlee
263 Posts
0
October 29th, 2010 05:00
In addition, make sure that tcp_fusion is disabled on Solaris 10