Unsolved

This post is more than 5 years old

18 Posts

2198

October 18th, 2010 11:00

Performance problems (Excessive Solaris 10 pollsys calls with nsrmmd).

Hi all,

My setup is a Sun T1000 running Solaris 10 U4 with recommended patches from 2/2010 (Generic_142900-03). Networker 7.5.2. FC connected

Overland NEO8000 with 4 LTO-4 drives. One storage node running on x4500 Solaris 10 U8 with 30TB of AFTD on ZFS. I have been struggling with

performance problems when staging from the x4500 to the tape drives on the T1000. Recently I found out the following:


# sar -c 1 10 SunOS legato.tau.ac.il 5.10 Generic_142900-03 sun4v    10/12/2010 02:15:02 scall/s sread/s swrit/s  fork/s  exec/s rchar/s wchar/s 02:15:03   21500    6207     187    0.00    0.00 73438304  540399 02:15:04   21160    6132      46    0.00    0.00 67140728  107171 02:15:05   17720    5203      37    0.00    0.00 45364104   56006 02:15:06   20566    6036      37    0.00    0.00 61809376   61430 02:15:07   28335    8159     110    0.00    0.00 93447008  519992 02:15:09   25388    7337      35    0.00    0.00 89129144   47190 02:15:10   27083    7837      71    0.00    0.00 91929992  141210 02:15:11   26471    7691      29    0.00    0.00 91041920   43699 02:15:12   26439    7656      47    0.00    0.00 90561864  111247 02:15:13   26514    7653     203    0.00    0.00 93589112  988914 Average    24111    6989      80    0.00    0.00 79720984  260102   With the following activity of nsrmmd:

# dtrace -n 'syscall:::entry /execname == "nsrmmd"/ {@[probefunc] = count(); } tick-5sec {printa(@); clear(@);}'

dtrace: description 'syscall:::entry ' matched 234 probes
CPU     ID                    FUNCTION:NAME
  6  55495                       :tick-5sec
  writev                                                            4
  kaio                                                           6732
  readv                                                         21647
  pollsys                                                       24447
  gtime                                                         24451

  6  55495                       :tick-5sec
  writev                                                            2
  kaio                                                           3588
  readv                                                         11020
  pollsys                                                       12482
  gtime                                                         12484

 The output of truss -v pollsys -p 25398 goes like that:

/1:     readv(11, 0xFFFFFFFF7FFE3660, 2)                = 57828
/1:     pollsys(0x100A3A088, 3, 0xFFFFFFFF7FFFC040, 0x00000000) = 1
/1:             fd=6  ev=POLLIN|POLLRDNORM rev=0
/1:             fd=8  ev=POLLIN|POLLRDNORM rev=0
/1:             fd=11 ev=POLLIN|POLLRDNORM rev=POLLIN|POLLRDNORM
/1:             timeout: 5.000000000 sec
/1:     readv(11, 0xFFFFFFFF7FFE3660, 2)                = 71576

/1:     kaio(9, 0x100CDE1D0, 0x00020000, 0x00000000, 0x4FCB6952, 0x100B715B0) = 0

/1:     kaio(12, 0x100CDDFF0, 0x00000000, 0x00000002, 0x00000096, 0x0000000A) = 0

/1:     time()                                          = 1286842696
/1:     pollsys(0x100A3A088, 3, 0xFFFFFFFF7FFFC040, 0x00000000) = 1
/1:             fd=6  ev=POLLIN|POLLRDNORM rev=0
/1:             fd=8  ev=POLLIN|POLLRDNORM rev=0
/1:             fd=11 ev=POLLIN|POLLRDNORM rev=POLLIN|POLLRDNORM
/1:             timeout: 5.000000000 sec
/1:     readv(11, 0xFFFFFFFF7FFEBBA0, 2)                = 59612
/1:     time()                                          = 1286842696
/1:     pollsys(0x100A3A088, 3, 0xFFFFFFFF7FFFC040, 0x00000000) = 1
/1:             fd=6  ev=POLLIN|POLLRDNORM rev=0
/1:             fd=8  ev=POLLIN|POLLRDNORM rev=0
/1:             fd=11 ev=POLLIN|POLLRDNORM rev=POLLIN|POLLRDNORM
/1:             timeout: 5.000000000 sec
/1:     readv(11, 0xFFFFFFFF7FFE3660, 2)                = 72968

/1:     kaio(9, 0x100CDDFF0, 0x00020000, 0x00000000, 0x4FCB6952, 0x100B715B0) = 0

/1:     kaio(12, 0x100CDE090, 0x00000000, 0x00000002, 0x00000096, 0x0000000A) = 0

/1:     time()                                          = 1286842696
/1:     readv(11, 0xFFFFFFFF7FFE3660, 2)                = 52764


  where fd 6 8 and 11 are sockets.


Is anyone aware of any Solaris/Networker bug or tuning that can help with this situation ?

--

-- Yaron.

736 Posts

October 29th, 2010 04:00

Hi  Yaron,

I don't know of any specific bug around this but you should check out the following document for tuning and testing information.

NetWorker 7.6.1 Performance Optimization Guide:

http://powerlink.emc.com/km/live1/en_US/Offering_Technical/Technical_Documentation/300-011-323.pdf?

It's got 7.6.1 written on it but it's also valid for 7.5.2 and gives details on how to tune and test the potential performance bottlenecks in your system which should help at least identify the source of the problem.

-Bobby

263 Posts

October 29th, 2010 04:00

Tape Marked Full Prematurely during cloning on Solaris after upgrading to NetWorker 7.5    (esg101757)

By default, NetWorker uses asynchronous I/O on Solaris 8 & 9 and uses synchronous I/O on Solaris 10.

Cause

ASYNC I/O interoperability issues with Solaris systems. 

On Solaris 10 systems, ASNYC I/O is  disabled on the Operating System by default. For Solaris 8 & 9  systems, ASYNC I/O is enabled by default.  As a result,  NetWorker environment variables will need to be set to workaround issues  encountered with ASYNC I/O on Solaris 8 & and on Solaris 10 if it  has been enabled.

Symptoms

Tapes marked full immediately during cloning on Solaris

Tapes marked full prematurely during cloning on Solaris servers

Error: "posix async write: Error 0"


Resolution

Once  an upgrade to 7.5 on a Solaris NetWorker server has completed it is  recommended that NetWorker environment variables be set prior to cloning  to alleviate this issue.

Solaris 10 (with ASYNC I/O enabled on the Operating System)

1.  From a command prompt edit the /etc/init.d/networker script.

2.  Before the line: (echo 'starting NetWorker daemons:') > /dev/console, add the following environment variable:

SKIP_SOL10_ASYNC_FIX =YES

3. Save the file

4. Stop the NetWorker daemons:

nsr_shutdown

5. Start the NetWorker daemons:

/etc/init.d/networker start

6. Note if the NetWorker server is also the NetWorker Management Console server, the gstd daemon will also have to be restarted:

/etc/init.d/gst start

Solaris 8 & 9

1.  From a command prompt edit the    /etc/init.d/networker script.

2.  Before the line: (echo 'starting NetWorker daemons:') > /dev/console, add the following environment variable:

DISABLE_SOL_ASYNC_IO=YES

3. Save the file.

4. Stop the NetWorker daemons:

nsr_shutdown

5. Start the NetWorker daemons:

/etc/init.d/networker start

6. Note if the NetWorker server is also the NetWorker Management Console server, the gstd daemon will also have to be restarted:

/etc/init.d/gst start

263 Posts

October 29th, 2010 05:00

In addition, make sure that tcp_fusion is disabled on Solaris 10

Top