Start a Conversation

Unsolved

This post is more than 5 years old

3691

October 25th, 2013 12:00

MCx - Under the Hood

YOU MAY ALSO BE INTERESTED ON THESE ATE EVENTS...

Ask the Expert: Best Practices for VNXe Capacity Planning

Ask the Expert: Transactional NAS and Next Generation VNX

https://community.emc.com/message/752500#752500

This Ask the Expert event will help you understand the new VNX 2 platform running MCx.

The experts will discus topics such as:

  • MCx Cache
    • Thread (array process) Scheduling
  • MCx FAST Cache
    • Warm Up Period
    • Least Recently Used Algorithm
  • MCx FAST VP
    • Best Practices
  • All things VNX2 and MCx

This discussion begins on November 11th and concludes on November 15th. Get ready by bookmarking this page or signing up to receive email notifications.

Your Hosts:

46.png?a=10440

Greg Swain has been an information technology specialist for the last 15 years, before joining EMC in 2004 he worked as a Boston based consultant focusing on backup, disaster recovery strategies and networking. After joining EMC Greg worked as a system’s administrator and integrator specializing in bringing recently acquired organization into EMC own information technology infrastructure. The integration process typically required working with competitive technologies managing data center build outs and deploying EMC infrastructure including mid-tier storage and archive. Currently Greg is a Systems Engineering Manger based out of Franklin MA.


46.png?a=10438

Christopher DiMartino is a Systems Engineer for EMC, focused on MidMarket GEO.

Chris’s position at EMC comes after 18+ years working in the IT field in southern New England. He now focuses on designing data center solutions that meet the client’s needs today as well as planning for future growth. Prior to joining EMC, Chris has worked at several Fortune 200 companies, insurance providers and the US Naval War College. With a primary focus on infrastructure security and compliance, Chris has had exposure to many varied environments, from companies with less than 10 employees all the way to redesigning the PCI scanning methodology of the current Fortune 1.


46.png?a=10437

Oliver Ames, an EMC Senior Inside Sales Systems Engineer, came to EMC with over 7 years of IT experience having worked for Florida’s largest private insurance brokerage firm; PC repair, help-desk management, Systems Administration and Backup Administrator for over 20 branch offices and corporate headquarters.

Oliver uses his experience to educate customers on what best practices and approaches can be taken to ensure business can enjoy growth, while at the same time lowering costs to secure and maintain growing data sets.

Message was edited by: OllieAmes

Message was edited by: OllieAmes

2.1K Posts

November 11th, 2013 11:00

Hey guys, Let's kick things off with a question that has been on my mind since the first official announcements of MCx.

What is the rationale behind the change to the sparing process? Now any appropriate unbound drive can be selected as a hot spare without having to define specific hot spares. And once a hot spare is engaged and the drive replaced nothing fails back to the original drive, it just becomes an unbound drive that might be used as a hot spare later.

I know that there are ways of looking at this that make it look advantageous to the customer, but I don't totally buy that the advantatages outweigh the issues... especially for control freaks (like me).

So here are the advantages as I see them pitched:

  • No need to predefine and manage hot spares
  • No need to wait for a rebuild on the original drive once it is replaced
  • Performance impact of rebuilds is limited to the initial hot spare activation

Now the concerns I have with this:

  • Lumping potential hot spares with drives waiting to be configured into a pool or RAID group makes it more difficult to identify exactly what you have available for use as data drives
  • If you are planning to build a traditional RAID Group for a specific purpose there is the possibility of one of your drives being allocated permanently as a hot spare unless you immediately configure the RAID Group when the drives are installed
  • If you have stringent performance requirements you may have configured things a certain way (e.g. within/across DAEs or busses) to achieve that target but the hot sparing process can undo that work with no easy way to get back
  • Yes I'm aware that you can manually free up a drive that was selected and allocated as a hot spare, but the only way to specify what drive you will go back to is to use the CLI and specify the target drive

I'm hoping there is more to the whole story that will allow the pros to outweigh the cons, but I haven't seen that happen yet. Can someone talk to this in more detail so we can understand why the existing process was "ripped & replaced" instead of offering it as an option that could be turned on or off? Cause the way things sit I would see NO true advantage of turning on the new way of hot sparing.

For the record, and since it sounds like I'm JUST griping here... I LOVE 90% of what I've seen so far with the changes introduced with MCx and the new VNX series. We are just deploying our first pair of VNX8000s and I'm hoping they live up to my expectations. I just want to better understand the thoughts behind the hot sparing changes so I can decide once and for all if I'm on board with it or will just have to accept it and move on.

{I'll be following up with my other point of concern a bit later once we get this one out of the way}

11 Posts

November 11th, 2013 11:00

EMC’s next generation platform, the VNX2 with MCx (Everything Multi-Core) architecture.  Engineered to accommodate disruptive technologies; cloud computing, multi-core centric applications , and a FLASH technologies enabling businesses to drive higher workloads.  Who’s come up against some pretty stringent application performance demands where you think, “will my current storage system or even DAS cope with the new demands”?  Take a peek at some of these articles provided in the hyperlinks below; can anyone out there relate?

               http://talkincloud.com/emc-ceo-joe-tucci-cloud-is-most-disruptive-tech-wave-ever

               http://www.crn.com/news/cloud/229403082/emc-ceo-tucci-cloud-disruption-means-opportunity.htm

From what I’ve seen, Customers strive to hold an edge, cultivating their business to be better than their competitors.  Customer retention, demanding SLA’s, all the while adding new applications designed to streamline business processes.  This growth is straining yesterdays’ IT infrastructure. The problem, IT budgets for the most part are spent on fire fighting and coping with simply “keeping the lights on”. 

The last link here http://www.emc.com/about/news/press/2013/20130904-03.htm provides some great insight by customers using the new MCx architecture. It would be great to hear more from others who’ve chosen the VNX2 with MCx as their new foundation to move into the cloud.

1 Rookie

 • 

20.4K Posts

November 11th, 2013 12:00

1) I am interested to know if you have introduced/improved performance counters in Unisphere Analyzer ? I am particularly interested in matrices for Fast/Fast Cache at more granular level (LUN level for example).  My customers are asking me if their LUNs are Fast Cache friendly and for pool LUNs i can't answer that question ( or maybe i don't know what i am looking at, please educate me). For example if a LUN is receiving 100 I/O ..how many of those are being serviced out of Fast Cache, System Cache and finally disk.

2) Since we are on this topic of performance, any plans to introduce more robust performance tools, something in the same class as Unisphere for VMAX ?  VNX for performance and reporting, while free now does not hold a candle to the tools available in Symmetrix area.

3) Do you know when new BP white papers for Block OE 33 will be published, i am getting a couple 5600 so would like to revisit the BP if things have changed.

8.6K Posts

November 11th, 2013 13:00

Hi Allen,

I think the rationale behind the new sparing process was to make things simpler for the customer.

By default a customer doesn’t have to manually create spares and the system will automatically “reserve” a number of drives as hot spares.

You can still have less spares or no spares – but you have to make it a point to change the hot spare policy.

Not having to equalize a rebuilt drive back to its original slot is something that a lot of customers asked – they didn’t want the work and extra I/O to do so.

If you want to go back to your previous layout you can still do by either doing a “copy back” or by just physically moving the drive.

Personally I think this both reflects a change from the old way of manually specifying where each bit goes to a provisioning where the system automatically makes a good choice.

Rainer

11 Posts

November 12th, 2013 09:00

Dynamox, no changes have been made to the counters in Unisphere Analyzer to date.  Currently, one may only monitor the health of the FAST Cache drives.  At a very high level, LUN’s that are sequentially written to or read from would not be candidates for FAST Cache because streaming data sets would be excluded by the FAST Cache algorithms; note that FAST Cache was designed for random small block workloads (reads and writes).  It would not make sense to ingest a streaming workload (reads or writes) into FAST Cache to only have to turn around to represent them to the arrays processor(s) cache.  Also, any reads or writes that are 64KB or larger would also be excluded from FAST Cache, and would in turn be services by the arrays processor cache.

Now, when we dig a bit deeper, we need to differentiate VNX (Classic) FAST Cache to VNX MCx FAST Cache.  The VNX Classic FAST Cache required that blocks be “hit” at least three times within a period of time to be promoted into EFD’s as secondary cache, therefore, LUNs, on a RAID Group or in a Pool may not realize the benefits of FAST Cache depending on data access patterns.  Whereas in the VNX2, LUN’s in a RAID Group or Pool may benefit much more so as there is a 1 (One) hit promotion.  While this 1 (One) hit promotion only occurs until the FAST Cache tier is below 80% capacity utilized, customers may now experience a higher performance increase for their LUN’s be it a RAID or Pool LUN.  Once FAST Cache passes an 80% capacity utilization, there again remains a 3 (Three) hit promotion.  The new VNX also employs a cleaning algorithm to free up FAST cache to better service dynamic workloads.

In regards to understanding how 100 I/O could be benefited by FAST Cache, it really comes down to data access patterns.  Take for instance a VDI (Virtual Desktops).  VDI, for say a 100 Windows O/S result in a tremendous amount of common data (blocks).  If 80 end users login at the exact same time or within a similar amount of time in the morning, there would be a “Login storm”.  Typically, the end user would experience a prolonged period of time till they see their desktop.  Introduce FAST Cache, and the multiple “hits” on the O/S would cause the FAST Cache algorithm to promote those blocks into EFD to accelerate performance and reduce the latency experienced.

Here is a quick video on YouTube that demonstrates how FAST Cache can assist in this particular use case.  Boot Storm Video

There is no way to monitor the exact I/O’s (blocks) that are being serviced by FAST Cache.  To best determine if a LUN can benefit from this technology (FAST Cache) one would leverage Unisphere Analyzer;  monitoring a LUN or LUN’s access patterns thus determining  if workloads are sequential vs. random/ sequentially read from or written to; random being better as mentioned in the first paragraph.  Also, we must keep in mind that if the read or write size is larger than 64KB; FAST Cache would not promote these blocks into EFD’s; an example of this would be a backup stream, the data is being read once, and is generally large block, therefore FAST Cache would simply ignore this event (process) as it would only serve to absorb a resource best used toward something like an OLTP workload.

Note: Via Unisphere Analyzer, you can in fact select LUN’s that have access to FAST Cache, selecting “FAST Cache Write Hit’s / “Read Hit’s” enables administrators to see the type performance a LUN or LUN’s attain from being able to access the FAST Cache resource.  Other metrics such as “Read and Write Misses, as well as Read and Write Ratio’s” can be very beneficial to understand if that LUN is a good candidate for FAST Cache. 

Please note, the above instructions are only applicable for LUN(s) on a traditional RAID Group(s), and not for Pool LUN(s).

Being that FAST Cache is intuitive in nature, this truly alleviates the need for administrators from having to “tweak” any settings.  However, if you determine a Pools LUN’s would not benefit from FAST Cache, you have the option of disabling FAST Cache for that particular Pool.  You also have the option of enabling or disabling FAST Cache per LUN on traditional RAID Groups.  Again, enable or disable this feature for those LUN’s that either will benefit or will not; again, you’d want to monitor the LUN’s performance characteristics to determine what may be beneficial.

When speaking to the monitoring of your storage system, yes Monitoring and Reporting does not offer the in-depth analysis that so many administrators strive to attain.  However, Unisphere Analyzer (UA) is a critical asset when determining how the storage array is operating.  When customers attain the FAST Cache license they are provided a license for Unisphere Analyzer.  What’s nice about UA is the fact that you may run a real-time analysis or archival analysis (for extended periods of time).

For those customers whom leverage VMware as their hypervisor, there is the option of purchasing vCOPs (vCenter Operations) as well as EMC’s SRM (Storage Resource Management).  I’ve included another YouTube link that provides a quick preview of EMC SRM.  This link will bring you to the EMC SRM Landing page where you can attain a FREE Trial of the monitoring suite.

Yes, the capabilities of the monitoring tools differ between the VNX and VMAX (Symmetrix) storage line.  EMC is continually developing our interfaces and analytics to be accommodative “cross-platforms”; we must also take a step back and remember that the VMAX and VNX are built to meet customer demands.  If a customer does not require the level of service a VMAX provides, then too, they may not require the same analytics, or simply put, UA (Unisphere Analyzer) would suffice.  Should they require more detail and automation, they would then look to our SRM suite.

And, unfortunately, Block Best Practices is not yet published.  Many of the existing BBP’s hold true, but of course there will be some revisions made, for example, we’ve moved from dedicated “Hot Spares” to the array automatically using any unbound drive as a spare.  A SAS drive to spare for another failing or failed SAS drive assuming its capacity is able to accommodate the data to be copied.

I hope you’ve found the above information helpful.  Certainly keep the questions coming.  We can all benefit from these Ask the Expert sessions.

8.6K Posts

November 12th, 2013 09:00

That would be a roadmap question that you need to ask your local EMC contact directly

1K Posts

November 12th, 2013 09:00

Dynamox,

Not sure if you were already aware of this wp or not: (VNX MCx Multicore Everything) https://support.emc.com/docu48786_VNX_MCx_Multicore_Everything.pdf?language=en_US

November 12th, 2013 09:00

I understand that traditional LUNs are now truly active/active (instead of active/active w/ALUA).  Is there any reason to expect that kind of functionality for Pool LUNs?

1 Rookie

 • 

20.4K Posts

November 13th, 2013 08:00

OllieAmes wrote:

Note: Via Unisphere Analyzer, you can in fact select LUN’s that have access to FAST Cache, selecting “FAST Cache Write Hit’s / “Read Hit’s” enables administrators to see the type performance a LUN or LUN’s attain from being able to access the FAST Cache resource.  Other metrics such as “Read and Write Misses, as well as Read and Write Ratio’s” can be very beneficial to understand if that LUN is a good candidate for FAST Cache. 

really ? Those counters are only exposed when i look at pool, not at individual pool LUNs.

OllieAmes wrote:

Being that FAST Cache is intuitive in nature, this truly alleviates the need for administrators from having to “tweak” any settings.  However, if you determine a Pools LUN’s would not benefit from FAST Cache, you have the option of disabling FAST Cache for that particular Pool.  You also have the option of enabling or disabling FAST Cache per LUN on traditional RAID Groups.  Again, enable or disable this feature for those LUN’s that either will benefit or will not; again, you’d want to monitor the LUN’s performance characteristics to determine what may be beneficial. 

that's still an issue,  i am trying to "pool" my disk spindles, not distribute them over multiple pools. Being able to disable Fast Cache for Pool LUNs is greatly desired.

OllieAmes wrote:

Yes, the capabilities of the monitoring tools differ between the VNX and VMAX (Symmetrix) storage line.  EMC is continually developing our interfaces and analytics to be accommodative “cross-platforms”; we must also take a step back and remember that the VMAX and VNX are built to meet customer demands.  If a customer does not require the level of service a VMAX provides, then too, they may not require the same analytics, or simply put, UA (Unisphere Analyzer) would suffice.  Should they require more detail and automation, they would then look to our SRM suite.

You are kidding yourself if you think that current UA is sufficient for customers of any significant size and expertise. I have been staring at the same old Navisphere/Unisphere Analyzer counters since i started working on Clariion FC4700. This tool is old, it's outdated, it's cumbersome to use.

2.1K Posts

November 14th, 2013 11:00

Thanks Rainer,

That still doesn't really address my concerns. I understand that some customers may have been looking for an easier way to manage something that wasn't difficult to manage in the first place, but what about the voices of those customers who liked it the way it was and had no reason to raise their voices because they had no reason to believe that others were voicing a desire for change? That may not be a question directly for you so much as for the Product Managers who make the choices on direction for features...

I don't really see anything in your response beyond what I had already posted as the potential advantages. Unless I missed something about the way the system "reserves" drives for hot spares, I thought it just worked with a number to hold back from allocation, not a reservation that takes into account ideal positions for hot spares to cover the existing configuration.

I can't really agree with your comments on how "easy" it would be to go back to the original configuration if you really wanted to either. Initiating a "copy back" is actually initiating a copy and specifying where you want to copy to. You need to do that at the command line and have to identify the source and target. Manually moving the drive positions is a possibility now with MCx, but probably not recommended to do more than you need either.

Those of us who have been dealing with these arrays for many years need some time to be completely comfortable with the shift from careful consideration of layouts to system control. I personally prefer options along the way that allow a carefully planned transition as opposed to having the control pulled out from under me. I'm not wording it that way to be confrontational, but to convey the feelings I get from these types of changes happening unexpectedly.

2.1K Posts

November 14th, 2013 11:00

I agree completely with dynamox on these points. UA has not changed significantly since my first FC4700s and it didn't make me happy back in those days.

Also the ability to have more granular control over FAST Cache use for Pool LUNs is pretty much a long term requirement. We only use traditional RAID Groups now for internal array functions requiring capacity. We leverage Pools for ALL host presented LUNs. That means host facing features not offered for Pool based LUNs are useless to us (e.g. Active/Active, FAST Cache control, etc.) It's great that they are coming some day, but for now those features don't mean much to us.

November 15th, 2013 12:00

Hello,

I would like to take a moment to weigh in on these questions as well, but
first let me preface that I agree with many of the questions and I would be
asking them myself were I in my previous role outside of EMC.  That being
said, we reached out to product engineering to get some more specific reasons
as to why some of these features are like they are.  I am going to place
those answers inline below.  Engineering answers are in red with my
comments following.

OllieAmes wrote:

      

Note: Via Unisphere Analyzer, you can in fact select LUN’s
that have access to FAST Cache, selecting “FAST Cache Write Hit’s / “Read
Hit’s” enables administrators to see the type performance a LUN or LUN’s attain
from being able to access the FAST Cache resource.  Other metrics such as
“Read and Write Misses, as well as Read and Write Ratio’s” can be very
beneficial to understand if that LUN is a good candidate for FAST
Cache. 

really ? Those counters are only exposed when i look at pool, not at
individual pool LUNs.

This is due to the architecture and where FAST Cache sits in the stack.  FAST Cache is only aware of classic LUNs (the LUNs that pools are built on), so these metrics can only be retrieved on classic LUNs.

-- To expand on this statement, it is the nature of how Pool LUNs
operate that make FAST Cache metrics unavailable at the Pool LUN level at this
time.  Hopefully this will change with future enhancements.

OllieAmes wrote:

      

Being that FAST Cache is intuitive in nature, this truly alleviates the need
for administrators from having to “tweak” any settings.  However, if you
determine a Pools LUN’s would not benefit from FAST Cache, you have the option
of disabling FAST Cache for that particular Pool.  You also have the
option of enabling or disabling FAST Cache per LUN on traditional RAID
Groups.  Again, enable or disable this feature for those LUN’s that either
will benefit or will not; again, you’d want to monitor the LUN’s performance
characteristics to determine what may be beneficial.

that's still an issue,  i am trying to "pool" my disk
spindles, not distribute them over multiple pools. Being able to disable Fast
Cache for Pool LUNs is greatly desired.

This is due to the architecture described in the first answer:

  

FAST Cache is only aware of classic LUNs (the LUNs that pools are built on), therefore it can only be turned on/off on an entire pool or any single classic LUN.

-- I agree that it would be nice to disable FAST Cache at the Pool LUN level, but I would only ever entertain this in a limited amount of situations, like a single tier pool.  I would want FAST Cache for multi-tier pools to buffer performance if data resides on slower spindles.

OllieAmes wrote:

    

Yes, the capabilities of the monitoring tools differ between the VNX and
VMAX (Symmetrix) storage line.  EMC is continually developing our
interfaces and analytics to be accommodative “cross-platforms”; we must also
take a step back and remember that the VMAX and VNX are built to meet customer
demands.  If a customer does not require the level of service a VMAX
provides, then too, they may not require the same analytics, or simply put, UA
(Unisphere Analyzer) would suffice.  Should they require more detail and
automation, they would then look to our SRM suite.

    

You are kidding yourself if you think that current UA is sufficient for
customers of any significant size and expertise. I have been staring at the
same old Navisphere/Unisphere Analyzer counters since i started working on
Clariion FC4700. This tool is old, it's outdated, it's cumbersome to use.

-- On this, we all agree 100%.  Luckily, this is one of the things that we were hoping would come up in this discussion, so that we can show that better monitoring and reporting is what our customers want and need.  Thank you for your questions and responses.

No Events found!

Top