Start a Conversation

Unsolved

This post is more than 5 years old

7446

August 12th, 2013 04:00

Ask the Expert: How to Optimize Your VNX

YOU MAY ALSO BE INTERESTED ON THESE ATE EVENTS...

Ask the Expert: Multicore Optimization (MCx) in the New VNX

Ask the Expert: Block Deduplication and FAST Suite Tiering in the New VNX

https://community.emc.com/message/710869#710869

This Ask the Expert event will cover general guidance and tips/tricks to help you get the most out of your VNX.

The experts will discuss how to leverage key capabilities embedded within the product interface such as:

  • The VNX health check
  • Customer code upgrades
  • The CRU process for drives.

This discussion begins on August 19 and concludes on August 30. Get ready by bookmarking this page or signing up to receive email notifications.

Your Hosts:

46.png?a=10440

Greg Swain has been an information technology specialist for the last 15 years, before joining EMC in 2004 he worked as a Boston based consultant focusing on backup, disaster recovery strategies and networking. After joining EMC Greg worked as a system’s administrator and integrator specializing in brining recently acquired organization into EMC own information technology infrastructure. The integration process typically required working with competitive technologies managing datacenter build outs and deploying EMC infrastructure including mid-tier storage and archive. Currently Greg is a Systems Engineering Manger based out of Franklin MA.


46.png?a=10438

Christopher DiMartino is a Systems Engineer for EMC, focused on MidMarket GEO.

Chris’s position at EMC comes after 18+ years working in the IT field in southern New England. He now focuses on designing data center solutions that meet the client’s needs today as well as planning for future growth. Prior to joining EMC, Chris has worked at several Fortune 200 companies, insurance providers and the US Naval War College. With a primary focus on infrastructure security and compliance, Chris has had exposure to many varied environments, from companies with less than 10 employees all the way to redesigning the PCI scanning methodology of the current Fortune 1.


46.png?a=10444
Nathaniel Fagundo is a Systems Engineer. He was originally hired into EMC as a Technical Support Engineer to support Their CLARiiON array. He then moved to VNX support when the new platform was released. After a few months I was moved up into the ELITE VNX support team to assist in supporting EMC's top 112 customers around the world. He is currently working at EMC as a Associate Systems Engineer or up to recently called an Associate Technical Consultant. Nathaniel focuses on direct and in-direct sales of new and emerging technologies comprised of both proprietary and partnered hardware, software, consulting, and cloud based solutions. He holds EMC Associate Certification in Information Storage Management, a VNX Specialist Certification for Platform Engineers, and a Data-center Virtualization and Cloud Infrastructure Specialist Certification.

46.png?a=10437

Oliver Ames, an EMC Systems Engineer, came to EMC with over 7 years of IT experience having worked for Florida’s largest private insurance brokerage firm; PC repair, help-desk management, Systems Administration and Backup Administrator for over 20 branch offices and corporate headquarters.

Oliver uses his experience to educate customers on what best practices and approaches can be taken to ensure business can enjoy growth, while at the same time lowering costs to secure and maintain growing data sets.


46.png?a=4617

Glen Kelley is an EMC Senior Hardware Support Engineer. He has spent the past 10 years working at EMC in Mid-Range array support (CLARiiON and VNX) as a Senior Hardware Support Engineer. Prior to EMC, Glen worked at Digital Equipment Corporation for 10 years and Data General for 17 years.


666 Posts

August 19th, 2013 07:00

This discussion is now open for your questions. We look forward to an interesting, fun and informative Ask the Expert event.

12 Posts

August 19th, 2013 08:00

I'll start with a quesiton posed by one of my clients recently: "What does the USM Health Check utility evaluate to determine health of the system"?

4.5K Posts

August 19th, 2013 08:00

The tool performs the following heath checks:

•Network Connectivity

•Management Service Status

•Storage Processor Status

•Hot Spare Status

•No Disk Faults

•Disk Status

•VNX OE for Block Committed

•Data Mover Status

•No Hardware Component Faults

When the health check completes, you can view all status details in the summary report at the bottom of the dialog box, or the status details of each individual rule by clicking its associated icon.

glen

5 Practitioner

 • 

274.2K Posts

August 19th, 2013 09:00

Glen is spot on.

The health check is typically the first step when trying to optimize a storage or NAS system.  

There has been some confusion about this topic the health check does not capture performance statistics or detail ways to optimize the system.  Often customers are looking for at least some performance information.  If they have VNX Analyzer enabled often the next step is to retrieve analytics from the system.   EMC and our partners can also help interpret those findings.

Greg-

12 Posts

August 19th, 2013 10:00

Does network connectivity help to determine impact on Mirror View sessions or other replication features that rely on network infrastructure? If not, what would you suggest as a next step?

5 Practitioner

 • 

274.2K Posts

August 19th, 2013 12:00

That is a multifaceted question the short answer is; no.   Bandwidth utilization is what's important and that is not avaiable via the health check.

The best practice is to have MirroView on its own SP port.  The health check will only give limited information mainly the port status up or down.  For MirrorView the lower the RPO the more bandwidth you would require, due to the fact that there would be less commonality among the data and less time available to transmit changes.   A shorter RPO for MirrorView does equate to more SAN resource utilization.  

I think the core of your question has to do with determining bandwidth required for a given dataset.  EMC has tools to help you determine what is a practical bandwidth for a particular dataset and RPO.  The sole purpose of the Business Continuity Solution Design tool is to analyze SAN or Host performance data and determine the impact and correct replication technology to use.   Simply put you load the BCSD tool with performance analytics garnered from the host or SAN input some variables and the tools tells you what you need for bandwidth.   EMC and partner engineers have access to the BCSD tool.

EMC and it partners also have services that are specifically targeted at doing full SAN/LAN/WAN analysis, these services are most often targeted at customers looking to deploy replication technologies in remote sites without a strong local or in house IT presence. 

Moderator

 • 

6.5K Posts

August 19th, 2013 22:00

Hi there

This may be very basic question ... Can anybody tell me how to perform health check from USM ?

I only found Diagnostics>Verify Storage System> Storage System Verification wizard ...

Is this it ?

If it is .... seems only get Block side information ... Can USM be able to check VNX file side health check as well ?

thanks !

Aya

August 20th, 2013 07:00

Hey there Aya,

You are correct in using the Verify Storage System function of USM to do a health check on the block side. The methods and procedures for running a health check differ for the Block and File operating environments. The following sections explain how to run a health check on a VNX Unified array using the following methods:

  • VNX BLOCK operating environment - Using the Verify Storage System function of USM.
  • VNX FILE operating environment -  Using the VNX nasadmin command line interface and using the Prepare for Installation function in the Unisphere USM System Software dialog.

I hope this helps,

Nathan

August 20th, 2013 09:00

Most customers may not realize that they can actually upgrade their VNX OE (Operating Environment) to the latest version of code by themselves. Because the VNX is fully redundant this can be done non-disruptively though it is good to note this only is the case if your hosts have redundant connections to the VNX (i.e. 2 paths, 2 switches, etc). It is recommended that the code upgrade should be done during a maintenance window though so the load on the array is minimal.

First you have to download the latest version of code. You can do that by opening up USM and going here:

"Array" > Software > Downloads > Download VNX Software Updates

Then after you have the code on a File or disk go to:

"Array" > Software > System Software

  • Step 1 runs a health check and verifies the system can be upgraded
  • Step 2 is the actual installation of the software

(Note: This is non-disruptive but the SPs will be rebooted one at a time. So there will be a bit of a performance hit.)

Once the process is complete your VNX will be on the latest version of code.

4.5K Posts

August 20th, 2013 09:00

In the most current version of USM, once you've logged into the array or the Control Station (using the IP address for the CS or the SPA/SPB), in the lower right corner under the Tools section, you'll find Health Check, click this to run the check. Most of the checks for for the array, and one check for the Data Movers.

The Verify Storage System check under the Diagnostics section is another check that, depending on the type of issue, is a bit more complete.

USM Versions with Health Check:

OE Release 32

UnisphereServiceManager-Win-32-x86-en_US-1.2.26.1.0068-1

OE Release 33 (the recently released Next Generation VNX) - this is backwards compatible with the older VNX and CLARiiON arrays

UnisphereServiceManager-Win-32-x86-en_US-1.3.0.1.0015-1

glen

BTW: the Health Check is an invaluable tool for quickly determining if there are issues on the array

August 20th, 2013 10:00

As any Storage Admin knows, in this world nothing can be said to be certain, except Death, Taxes, and Drive failures.

If you happen to notice a drive failure there and you would like to replace it yourself there is a simple way to get this done.

First you will need to open up a USM session to your VNX with the faulted disk. Then go to:

"Array" > Hardware > Hardware Replacement > Replace Faulted Disk

(There is a shortcut if you are already in Unisphere. Just go to "Array" > System > hardware > Storage Hardware. Then under “Service Tasks” on the bottom right, select “Replace Faulted Disk”)

After you are in USM all you need to do is go through the Wizard which will walk you through all the steps to getting rid of that bad disk.

1 Rookie

 • 

20.4K Posts

August 20th, 2013 11:00

I would like to understand if anything has changed between CX/VNX/VNX2 in terms of preventive disk failure. I'll give a synopsis of what happened to our CX3-80 last year. 5+1R5 group presented to Celerra, one drive in the raid group fails. No problem, HS kicks in and starts rebuilding. Five hours into rebuild, another drive in the same RG starts throwing off media errors and goes down. After many hours of troubleshooting EMC Support declared LUNs that were in the middle of rebuilt were toasted ..data loss. So the question comes up, why did this happen ? Why your "maintenance" task (BV ?) did not catch issues with drives that were still spinning but had media issues and not proactively replaced.

Thanks

12 Posts

August 20th, 2013 11:00

Greg,

Thanks for your response on the HC tool and highlighting the 'Business Continuity Solution Design'. Is this a tool that can be leveraged while observing performance degradation to provide insight to the source of the problem?

5 Practitioner

 • 

274.2K Posts

August 20th, 2013 11:00

Hi Jeff,

While the tool is not designed to do that it would highlight design flaws that often lead to network latency and performance issues.  Unisphere Analyzer would be the tool of choice for observing performance degradation; if the array is the source of the issue you would see it there.  If the performance degradation is purely network related a networking based tool would be best, think Solarwinds or native Cisco tools.

August 20th, 2013 12:00

Hi Dynamox,

I am sorry to hear about that. That sounds like rough experience. Just like most storage manufacturers, EMC doesn't actually manufacture our own hard drives. That is why we put them through extensive testing in our facilities before putting them out in customer boxes. So the odds of 2 disks failing in a raid group are VERY rare though it has happened. That is why some applications are run on Raid 6, to protect them from that duel disk failure.

The way the VNX proactively prevents disk failures is by monitoring soft media errors on each disk. If a certain amount of these errors reach a designated threshold, the VNX will notify EMC and start to proactively copy the disk to a hot spare. Any bad sectors on the disk will be rebuilt from parity (in the case of Raid 5).

In the resent years we have actually become much better at predicting hardware faults to the point at which now I hear that 64% of VNX drive replacements are done before the disk has fully faulted out of the array. Sadly, disk failures are nearly impossible to prevent from the storage array side. It is up to the disk supplies of the world to make sure that they are putting out a solid reliable product but they obviously cant catch every possible issue. In the case of EMC, we go the extra mile to do our own testing on top of the drive vendors to even further reduce the number of disk faults in our arrays. If you are worried about multiple disk failures over performance I would recommend using a RAID 6 configuration.

No Events found!

Top