make sure there are no faults, make sure you stay on the supported code level, make sure properl call-home/email-home is configured (verify SMTP server address in the template)
typically you don't have to check for faults manually because the array is supposed to call home for events that require attention. But ...in my experience i always like to have another set of "eyes" to check on health of my arrays. Back in the days when call-home events were actually "dial-home" events using regular phone lines, there were instances in my data center where the phone lines would get disconnected because someone thought those lines were not in use. So you have a disk failure in your array, it cannot dial out so you end up having failure and know about it unless you happen to login to the GUI. Now days notification is done via email , but that does not guarantee anything. I have had instances where customer changed their SMTP server IP address and the array was not able to dial out.
So i always create little perl/bash scripts that use naviseccli commands for Clariion/VNX and symcli for DMX/VMAX to verify health of my arrays. For VNX there are other tools that help you monitor the health, i can't remember the app name now but you can add multiple VNX arrays and monitor their status from there.
I manage about 8 Different VNX arrays across our infrastructures and what I do besides the normal day to day alerts and monitoring is once a month I run a series of manual checks per array its very quick. Ive included what I run and intruiged what others may do.
Im sure there are easier more automated ways but nothing beats hands on assurance that things are working properly. I also just run them quickly from cli and pipe the output to a file so it acts as a oh yeah that was the state of it on so and so day.
These are for Unified systems:
/nas/sbin/getserial
/nas/sbin/getreason
nas_storage -c -a(SP Issues)
/nas/sbin/navicli -h SPA faults -list
/nas/sbin/navicli -h SPA getlun -trespass(Check for Trepassed Luns)
Another tool you may want to consider is Unisphere Service Manager (USM). There is a Diagnostics tab which will give you the options to "Verify Storage System" and "Capture Diagnostic Data". You also get a list of applicable technical advisories for your system, can also generate an HTML output of system configuration which will include list of issues and system fault summary, there is also an option to run health check.
dynamox
9 Legend
•
20.4K Posts
1
January 9th, 2015 15:00
make sure there are no faults, make sure you stay on the supported code level, make sure properl call-home/email-home is configured (verify SMTP server address in the template)
Rojizo
1 Rookie
•
20 Posts
0
January 9th, 2015 16:00
OK thanks. How often do I need to check for faults and where do I check for them?
dynamox
9 Legend
•
20.4K Posts
1
January 9th, 2015 17:00
typically you don't have to check for faults manually because the array is supposed to call home for events that require attention. But ...in my experience i always like to have another set of "eyes" to check on health of my arrays. Back in the days when call-home events were actually "dial-home" events using regular phone lines, there were instances in my data center where the phone lines would get disconnected because someone thought those lines were not in use. So you have a disk failure in your array, it cannot dial out so you end up having failure and know about it unless you happen to login to the GUI. Now days notification is done via email , but that does not guarantee anything. I have had instances where customer changed their SMTP server IP address and the array was not able to dial out.
So i always create little perl/bash scripts that use naviseccli commands for Clariion/VNX and symcli for DMX/VMAX to verify health of my arrays. For VNX there are other tools that help you monitor the health, i can't remember the app name now but you can add multiple VNX arrays and monitor their status from there.
Jon_hope
7 Posts
2
January 9th, 2015 17:00
I manage about 8 Different VNX arrays across our infrastructures and what I do besides the normal day to day alerts and monitoring is once a month I run a series of manual checks per array its very quick. Ive included what I run and intruiged what others may do.
Im sure there are easier more automated ways but nothing beats hands on assurance that things are working properly. I also just run them quickly from cli and pipe the output to a file so it acts as a oh yeah that was the state of it on so and so day.
These are for Unified systems:
/nas/sbin/getserial
/nas/sbin/getreason
nas_storage -c -a(SP Issues)
/nas/sbin/navicli -h SPA faults -list
/nas/sbin/navicli -h SPA getlun -trespass(Check for Trepassed Luns)
>>/nas/sbin/navicli -h SPA trespass mine(IF Trespassed Luns exist)
nas_cs -info
nas_inventory -tree
nas_fs -list(Check Filesystem)
server_df server_2(3)(If filesystem live on DM 2 or 3)
nas_checkup(This will verify call home/Auto transfer status)
khanz1
2 Intern
•
136 Posts
2
January 9th, 2015 18:00
Another tool you may want to consider is Unisphere Service Manager (USM). There is a Diagnostics tab which will give you the options to "Verify Storage System" and "Capture Diagnostic Data". You also get a list of applicable technical advisories for your system, can also generate an HTML output of system configuration which will include list of issues and system fault summary, there is also an option to run health check.
anre51801
4 Posts
0
January 19th, 2015 09:00
VNX Monitoring and Reporting ... is free with the purchase of a VNX. can be installed on free CentOS 6
or heavier lifting with ViPR SRM so you can correlate and monitor more modules: Vmware, Network, etc.