此帖子已超过 5 年
1 Rookie
•
91 消息
0
9812
VNX的一个报错,请求分析
最近发现存储有这个报错,结果通过SPA管理口登陆不进存储了,SPB还可以,请问一下,这是什么情况,要怎么才能解决
Severity : Error
Domain : Local
Created : Aug 6, 2013 1:58:00 AM
Message : Unisphere can no longer manage (SP A).
Full Description : Unisphere can no longer manage the other storage processor (SP A) in this storage system. Server I/O to the storage system is not impacted by this.
Recommended Action : Verify that the storage processor has a valid management LAN connection and that the SP does not have a hardware fault.
Event Code : 0x743a
big_lei
450 消息
0
2013年8月14日 06:00
你可以找800开case,支持一下。
restart mgmt server 简单步骤:
1.笔记本配好地址,ping通spA,登录http://spa address/setup页面,选择restart mgmt server选项,点击重启。
2.如果无法ping通SPA,可以通过命令行,链接SPB,使用naviseccli rebootpeerSP或者通过unisphere界面选择reboot SPA。
restart k10一般需要CE来操作,登录RA,重启K10服务进程。
仅供参考。
Z_Warden
1 Rookie
1 Rookie
•
91 消息
0
2013年8月14日 04:00
hi,born_chen:
非常感谢您的回答
看了您上面的信息,我还是没弄明白是由什么引起的,该怎么解决,有没有什么方法可以排查下
born_chen
1.8K 消息
1
2013年8月14日 04:00
("can no longer manage SP A"), but it does not affect server I/O to the storage
system.
Email alert with event code 0x743a stating SP A cannot be managed
even though there is no fault on the array.
Storage Array APMxxxxxxxx SP B Device N/A Description can no longer manage (SP
A). This does not impact server I/O to the storage system. See alerts for
details.
Product: Unisphere
the storage processor (SP) is not reachable due to network issue and the like.
However, in this case, after checking the SP events, it was found that the
Management Server process (CIMOM) was restarted on its own due to which this
alert with event code 0x743a was sent.
Host SPA Storage Array APMxxxxxxxx SP N/A Device N/A Description Timed out
waiting for startup event from NaviCimom
hit its memory threshold and may cause a restart on its own. See 8071.
born_chen
1.8K 消息
1
2013年8月14日 04:00
but it does not affect server I/O to the storage system.
Time Stamp
03/14/12 01:17:53 (GMT) Event Number 743a Severity Error Host SPB Storage Array
APMxxxxxxxx SP B Device N/A Description can no longer manage (SP A). This does
not impact server I/O to the storage system. See alerts for
details.Email alert with event code 0x743a stating SP A cannot be
managed even though there is no fault on the array and generated many times per
week.
Product: Unisphere
Product: CLARiiON AX Series
Product:
CLARiiON CX Series
Product: CLARiiON CX3 Series
restarts you need to enable "user dump upon error" on the debug page of SPA +
SPB ( http://ipaddress of
SP/debug), then the next time the agent restarts a navi dump will be created
with the name : CIMOM_XXX.DMP. Once you have the dump contact EMC customer
support for further assistance and quote this solution ID.
1. Right-click the SP that reports the unmanaged event and
select the option for 'File Transfer Manager' that will allow you to see all the
files saved on the storage processor.
2. Locate the dmp file that will be in
the format similar to the following:
CIMOM_XXX.dmp
Example:
CIMOM_terminate.dmp
3. Move the dump from the SP to the selected directory on
your workstation using the File Transfer Manager.
big_lei
450 消息
0
2013年8月14日 05:00
restart mgmt server
restart k10
big_lei
450 消息
0
2013年8月14日 06:00
查看一下emc290163 \emc317171\emc313029这个文档。
The "41000005 Process NaviCimom exited with return code" event is repeated in the SP Event Log indicating the CIMOM is repeatedly restarting.In the the spcollect, the SPX_cimomlog*.txt file(s) will contain the string connect : No buffer space available.In the the spcollect, the admin_tlddump.txt file will contain the string "exception 60000120: Invalid embedded count." (not always present)Cannot manage VNX SP's via Unisphere ManagerSP's are not manageable through Unisphere Manager.Management Server restart does not resolve the problem.K10 Governor service stop start sequence direct on the SP does not resolve the problem.
EMC SW: VNX Operating Environment (OE) for Block 05.31.000.5.xxxProduct: VNX SeriesProduct: VNX Unified/BlockDoes not apply to EMC SW: VNX Operating Environment (OE) 05.31.000.5.720 or laterDoes not apply to EMC SW: VNX Operating Environment (OE) for Block 05.32EMC SW: Unisphere Service Manager (USM)
This problem can occur on systems that are up > 497 days. The issue is that TCP connections are remaining in a TIME_WAIT/CLOSE_WAIT state for excessively long periods of time (in some cases, indefinitely). While in these states, the particular socket pairs remain unusable and if enough accumulate it results in port exhaustion preventing the CIMOM from starting.
Reboot the SP that is exhibiting the symptoms.
Fix:
Upgrade to 05.31.000.5.726
Workaround:
Reboot the SP's if the problem is seen.
Permanent Fix:
The issue is fixed in R31.720 (MR2 SP3). The recommendation is to upgrade the Flare revision to R31.726, for the latest fixes and enhancements.
big_lei
450 消息
0
2013年8月14日 06:00
R31.720以下版本,运行超过4xx多天,会出现这种情况。
至于具体原因,我也不太晓得了。
升级code吧,到R31.727.
Z_Warden
1 Rookie
1 Rookie
•
91 消息
0
2013年8月14日 06:00
其它都正常,硬件也没报警,code版本是31.509的
为什么会出现这种现象呢
Z_Warden
1 Rookie
1 Rookie
•
91 消息
0
2013年8月14日 06:00
好的,非常感谢您的回答
big_lei
450 消息
0
2013年8月14日 06:00
restart memt server 对数据没影响,你现在SPA除了现在不能管理,其他访问都还正常吧。
另外code版本是多少?
Z_Warden
1 Rookie
1 Rookie
•
91 消息
0
2013年8月14日 06:00
请问这两个操作在哪执行呀,k10是什么意思
Z_Warden
1 Rookie
1 Rookie
•
91 消息
0
2013年8月14日 06:00
restart mgmt server这个操作会对存储的数据有影响吗
big_lei
450 消息
0
2013年8月14日 19:00
跟domain没多大关系,SPA正常运行,业务都OK.
只不过是unmanage,重启mgmt server。
cxemc
2 Intern
2 Intern
•
362 消息
1
2013年8月14日 19:00
这个很简单啊。
SPA没有连接,或者SPA在报错,再或者SPA没有在domain里。
Roger_Wu
2 Intern
2 Intern
•
4K 消息
0
2013年8月14日 19:00
基本上就楼上这些方法,觉得近期不方便升级或机房审批制度比较严格的,就按需要重启management server;长远来看还是升级到最新的FLARE Code一劳永逸。