开始新对话

此帖子已超过 5 年

Solved!

Go to Solution

9811

2013年8月14日 03:00

VNX的一个报错,请求分析

最近发现存储有这个报错,结果通过SPA管理口登陆不进存储了,SPB还可以,请问一下,这是什么情况,要怎么才能解决

Severity : Error

Domain : Local

Created : Aug 6, 2013 1:58:00 AM

Message : Unisphere can no longer manage (SP A).

Full Description : Unisphere can no longer manage the other storage processor (SP A) in this storage system. Server I/O to the storage system is not impacted by this.

Recommended Action : Verify that the storage processor has a valid management LAN connection and that the SP does not have a hardware fault.

Event Code : 0x743a

450 消息

2013年8月14日 06:00

你可以找800开case,支持一下。

restart mgmt server 简单步骤:

1.笔记本配好地址,ping通spA,登录http://spa address/setup页面,选择restart mgmt server选项,点击重启。

2.如果无法ping通SPA,可以通过命令行,链接SPB,使用naviseccli rebootpeerSP或者通过unisphere界面选择reboot SPA。

restart k10一般需要CE来操作,登录RA,重启K10服务进程。

仅供参考。

1 Rookie

 • 

91 消息

2013年8月14日 04:00

hi,born_chen:

          非常感谢您的回答

          看了您上面的信息,我还是没弄明白是由什么引起的,该怎么解决,有没有什么方法可以排查下
         

1.8K 消息

2013年8月14日 04:00

What does event code 0x743a mean?
Notification alert with event code 0x743a
("can no longer manage SP A"), but it does not affect server I/O to the storage
system.
Email alert with event code 0x743a stating SP A cannot be managed
even though there is no fault on the array.


Time Stamp 03/14/12 01:17:53 (GMT) Event Number 743a Severity Error Host SPB
Storage Array APMxxxxxxxx SP B Device N/A Description can no longer manage (SP
A). This does not impact server I/O to the storage system. See alerts for
details.
Product: CLARiiON CX4 Series
Product: Unisphere
This could be due to various reasons, like
the storage processor (SP) is not reachable due to network issue and the like.
However, in this case, after checking the SP events, it was found that the
Management Server process (CIMOM) was restarted on its own due to which this
alert with event code 0x743a was sent.
Time Stamp 03/14/12 01:29:24 (GMT) Event Number 41004001 Severity Warning
Host SPA Storage Array APMxxxxxxxx SP N/A Device N/A Description Timed out
waiting for startup event from NaviCimom
A memory leak usually causes the CIMOM to
hit its memory threshold and may cause a restart on its own. See 8071.

1.8K 消息

2013年8月14日 04:00

743a can no longer manage SP.

Notification alert with event code 0x743a ("can no longer manage SP A"),
but it does not affect server I/O to the storage system.

Time Stamp
03/14/12 01:17:53 (GMT) Event Number 743a Severity Error Host SPB Storage Array
APMxxxxxxxx SP B Device N/A Description can no longer manage (SP A). This does
not impact server I/O to the storage system. See alerts for
details.

Email alert with event code 0x743a stating SP A cannot be
managed even though there is no fault on the array and generated many times per
week.
Product: CLARiiON CX4 Series
Product: Unisphere
Product: CLARiiON AX Series
Product:
CLARiiON CX Series
Product: CLARiiON CX3 Series
If the user is concerned about the agent
restarts you need to enable "user dump upon error" on the debug page of SPA +
SPB ( http://ipaddress of
SP/debug), then the next time the agent restarts a navi dump will be created
with the name : CIMOM_XXX.DMP. Once you have the dump contact EMC customer
support for further assistance and quote this solution ID.
How to gather the dump file ?

1. Right-click the SP that reports the unmanaged event and
select the option for 'File Transfer Manager' that will allow you to see all the
files saved on the storage processor.
2. Locate the dmp file that will be in
the format similar to the following:
CIMOM_XXX.dmp
Example:
CIMOM_terminate.dmp
3. Move the dump from the SP to the selected directory on
your workstation using the File Transfer Manager.

450 消息

2013年8月14日 05:00

restart mgmt server

restart k10

450 消息

2013年8月14日 06:00

查看一下emc290163 \emc317171\emc313029这个文档。


CIMOM restarts repeatedly on array when running VNX OE for Block  05.31.000.5.xxx release.
The "41000005 Process NaviCimom exited with return code" event is repeated in the SP Event Log indicating the CIMOM is repeatedly restarting.

In the the spcollect, the SPX_cimomlog*.txt file(s) will contain the string connect : No buffer space available.

In the the spcollect, the admin_tlddump.txt file will contain the string "exception 60000120: Invalid embedded count." (not always present)

Cannot manage VNX SP's via Unisphere Manager

SP's are not manageable through Unisphere Manager.

Management Server restart does not resolve the problem.

K10 Governor service stop start sequence direct on the SP does not resolve the problem.
Product: VNX Family
EMC SW: VNX Operating Environment (OE) for Block 05.31.000.5.xxx

Product: VNX Series

Product: VNX Unified/Block

Does not apply to EMC SW: VNX Operating Environment (OE) 05.31.000.5.720 or later

Does not apply to EMC SW: VNX Operating Environment (OE) for Block 05.32

EMC SW: Unisphere Service Manager (USM)
This is a different manifestation of the tcp issue fixed in the "the 248 day issue" [ETA 3238].

This problem can occur on systems that are up > 497 days. The issue is that TCP connections are remaining in a TIME_WAIT/CLOSE_WAIT state for excessively long periods of time (in some cases, indefinitely). While in these states, the particular socket pairs remain unusable and if enough accumulate it results in port exhaustion preventing the CIMOM from starting.

Workaround:

Reboot the SP that is exhibiting the symptoms.

Fix:

Upgrade to 05.31.000.5.726


Workaround:

Reboot the SP's if the problem is seen.

Permanent Fix:

The issue is fixed in R31.720 (MR2 SP3). The recommendation is to upgrade the Flare revision to R31.726, for the latest fixes and enhancements.

这部分内容我编辑删除掉了,因为是有较高权限限制的。大家可以自行去support.emc.com网站登录查询,谢谢。(by Yanhong)


450 消息

2013年8月14日 06:00

R31.720以下版本,运行超过4xx多天,会出现这种情况。

至于具体原因,我也不太晓得了。

升级code吧,到R31.727.

1 Rookie

 • 

91 消息

2013年8月14日 06:00

其它都正常,硬件也没报警,code版本是31.509的

为什么会出现这种现象呢

1 Rookie

 • 

91 消息

2013年8月14日 06:00

好的,非常感谢您的回答

450 消息

2013年8月14日 06:00

restart memt server 对数据没影响,你现在SPA除了现在不能管理,其他访问都还正常吧。

另外code版本是多少?

1 Rookie

 • 

91 消息

2013年8月14日 06:00

请问这两个操作在哪执行呀,k10是什么意思

1 Rookie

 • 

91 消息

2013年8月14日 06:00

restart mgmt server这个操作会对存储的数据有影响吗

450 消息

2013年8月14日 19:00

跟domain没多大关系,SPA正常运行,业务都OK.

只不过是unmanage,重启mgmt server。

2 Intern

 • 

362 消息

2013年8月14日 19:00

这个很简单啊。

SPA没有连接,或者SPA在报错,再或者SPA没有在domain里。

2 Intern

 • 

4K 消息

2013年8月14日 19:00

基本上就楼上这些方法,觉得近期不方便升级或机房审批制度比较严格的,就按需要重启management server;长远来看还是升级到最新的FLARE Code一劳永逸。

找不到事件!

Top