Z_Warden

1 Rookie

•

91 消息

0

9812

2013年8月14日 03:00

VNX的一个报错，请求分析

最近发现存储有这个报错，结果通过SPA管理口登陆不进存储了，SPB还可以，请问一下，这是什么情况，要怎么才能解决

Severity : Error

Domain : Local

Created : Aug 6, 2013 1:58:00 AM

Message : Unisphere can no longer manage (SP A).

Full Description : Unisphere can no longer manage the other storage processor (SP A) in this storage system. Server I/O to the storage system is not impacted by this.

Recommended Action : Verify that the storage processor has a valid management LAN connection and that the SP does not have a hardware fault.

Event Code : 0x743a

回复(15)

big_lei

450 消息

0

2013年8月14日 06:00

你可以找800开case，支持一下。

restart mgmt server 简单步骤：

1.笔记本配好地址，ping通spA，登录http://spa address/setup页面，选择restart mgmt server选项，点击重启。

2.如果无法ping通SPA，可以通过命令行，链接SPB，使用naviseccli rebootpeerSP或者通过unisphere界面选择reboot SPA。

restart k10一般需要CE来操作，登录RA，重启K10服务进程。

仅供参考。

Z_Warden

1 Rookie

•

91 消息

0

2013年8月14日 04:00

hi，born_chen:

非常感谢您的回答

看了您上面的信息，我还是没弄明白是由什么引起的，该怎么解决，有没有什么方法可以排查下

born_chen

1.8K 消息

1

2013年8月14日 04:00

Goal	What does event code 0x743a mean?


Issue	Notification alert with event code 0x743a ("can no longer manage SP A"), but it does not affect server I/O to the storage system. Email alert with event code 0x743a stating SP A cannot be managed even though there is no fault on the array. Time Stamp 03/14/12 01:17:53 (GMT) Event Number 743a Severity Error Host SPB Storage Array APMxxxxxxxx SP B Device N/A Description can no longer manage (SP A). This does not impact server I/O to the storage system. See alerts for details.

Environment	Product: CLARiiON CX4 Series Product: Unisphere

Resolution	This could be due to various reasons, like the storage processor (SP) is not reachable due to network issue and the like. However, in this case, after checking the SP events, it was found that the Management Server process (CIMOM) was restarted on its own due to which this alert with event code 0x743a was sent. Time Stamp 03/14/12 01:29:24 (GMT) Event Number 41004001 Severity Warning Host SPA Storage Array APMxxxxxxxx SP N/A Device N/A Description Timed out waiting for startup event from NaviCimom

Notes	A memory leak usually causes the CIMOM to hit its memory threshold and may cause a restart on its own. See 8071.

born_chen

1.8K 消息

1

2013年8月14日 04:00

Issue

743a can no longer manage SP.

Notification alert with event code 0x743a ("can no longer manage SP A"),
but it does not affect server I/O to the storage system.

Time Stamp
03/14/12 01:17:53 (GMT) Event Number 743a Severity Error Host SPB Storage Array
APMxxxxxxxx SP B Device N/A Description can no longer manage (SP A). This does
not impact server I/O to the storage system. See alerts for
details.

Email alert with event code 0x743a stating SP A cannot be
managed even though there is no fault on the array and generated many times per
week.


Environment	Product: CLARiiON CX4 Series Product: Unisphere Product: CLARiiON AX Series Product: CLARiiON CX Series Product: CLARiiON CX3 Series

Resolution	If the user is concerned about the agent restarts you need to enable "user dump upon error" on the debug page of SPA + SPB ( http://ipaddress of SP/debug), then the next time the agent restarts a navi dump will be created with the name : CIMOM_XXX.DMP. Once you have the dump contact EMC customer support for further assistance and quote this solution ID.

Notes	How to gather the dump file ? 1. Right-click the SP that reports the unmanaged event and select the option for 'File Transfer Manager' that will allow you to see all the files saved on the storage processor. 2. Locate the dmp file that will be in the format similar to the following: CIMOM_XXX.dmp Example: CIMOM_terminate.dmp 3. Move the dump from the SP to the selected directory on your workstation using the File Transfer Manager.

big_lei

450 消息

0

2013年8月14日 05:00

restart mgmt server

restart k10

big_lei

450 消息

0

2013年8月14日 06:00

查看一下emc290163 \emc317171\emc313029这个文档。

CIMOM restarts repeatedly on array when running VNX OE for Block 05.31.000.5.xxx release. The "41000005 Process NaviCimom exited with return code" event is repeated in the SP Event Log indicating the CIMOM is repeatedly restarting. In the the spcollect, the SPX_cimomlog*.txt file(s) will contain the string connect : No buffer space available. In the the spcollect, the admin_tlddump.txt file will contain the string "exception 60000120: Invalid embedded count." (not always present) Cannot manage VNX SP's via Unisphere Manager SP's are not manageable through Unisphere Manager. Management Server restart does not resolve the problem. K10 Governor service stop start sequence direct on the SP does not resolve the problem.

Environment	Product: VNX Family EMC SW: VNX Operating Environment (OE) for Block 05.31.000.5.xxx Product: VNX Series Product: VNX Unified/Block Does not apply to EMC SW: VNX Operating Environment (OE) 05.31.000.5.720 or later Does not apply to EMC SW: VNX Operating Environment (OE) for Block 05.32 EMC SW: Unisphere Service Manager (USM)

Cause	This is a different manifestation of the tcp issue fixed in the "the 248 day issue" [ETA 3238]. This problem can occur on systems that are up > 497 days. The issue is that TCP connections are remaining in a TIME_WAIT/CLOSE_WAIT state for excessively long periods of time (in some cases, indefinitely). While in these states, the particular socket pairs remain unusable and if enough accumulate it results in port exhaustion preventing the CIMOM from starting.

Resolution	Workaround: Reboot the SP that is exhibiting the symptoms. Fix: Upgrade to 05.31.000.5.726 Workaround: Reboot the SP's if the problem is seen. Permanent Fix: The issue is fixed in R31.720 (MR2 SP3). The recommendation is to upgrade the Flare revision to R31.726, for the latest fixes and enhancements.

Notes (Employees and Partners)	这部分内容我编辑删除掉了，因为是有较高权限限制的。大家可以自行去support.emc.com网站登录查询，谢谢。(by Yanhong)

big_lei

450 消息

0

2013年8月14日 06:00

R31.720以下版本，运行超过4xx多天，会出现这种情况。

至于具体原因，我也不太晓得了。

升级code吧，到R31.727.

Z_Warden

1 Rookie

•

91 消息

0

2013年8月14日 06:00

其它都正常，硬件也没报警，code版本是31.509的

为什么会出现这种现象呢

Z_Warden

1 Rookie

•

91 消息

0

2013年8月14日 06:00

好的，非常感谢您的回答

big_lei

450 消息

0

2013年8月14日 06:00

restart memt server 对数据没影响，你现在SPA除了现在不能管理，其他访问都还正常吧。

另外code版本是多少？

Z_Warden

1 Rookie

•

91 消息

0

2013年8月14日 06:00

请问这两个操作在哪执行呀，k10是什么意思

Z_Warden

1 Rookie

•

91 消息

0

2013年8月14日 06:00

restart mgmt server这个操作会对存储的数据有影响吗

big_lei

450 消息

0

2013年8月14日 19:00

跟domain没多大关系，SPA正常运行，业务都OK.

只不过是unmanage，重启mgmt server。

cxemc

2 Intern

•

362 消息

1

2013年8月14日 19:00

这个很简单啊。

SPA没有连接，或者SPA在报错，再或者SPA没有在domain里。

Roger_Wu

2 Intern

•

4K 消息

0

2013年8月14日 19:00

基本上就楼上这些方法，觉得近期不方便升级或机房审批制度比较严格的，就按需要重启management server；长远来看还是升级到最新的FLARE Code一劳永逸。

查看全部

找不到事件！

入门级和中端

VNX的一个报错，请求分析