Start a Conversation

Unsolved

This post is more than 5 years old

22994

February 6th, 2015 15:00

Ask the Expert: Are you ready to manage deep archiving workloads with Isilon’s HD400 node and OneFS 7.2.0? Find out more about the Data Lake Foundation products

Welcome to the EMC Isilon Community Ask the Expert conversation.

YOU MAY ALSO BE INTERESTED ON THESE ATE EVENTS...

Ask The Expert – Isilon’s New Releases: IsilonSD Edge, OneFS.NEXT and CloudPools

Ask the Experts: EMC Isilon technical content and documentation

https://community.emc.com/thread/179245

Our Experts today are here to answer any and all questions you may have related to the Data Lake Foundation product launches from earlier this month. As a quick recap, our experts are here to talk about the latest Isilon OneFS operating system (version 7.2.0), Insight IQ 3.1 and the newest storage platform for high density, deep archiving workloads, called HD400. Additionally we have a guest speaker George Hamilton to answer any questions about the Elastic Cloud Storage solutions and how it works as part of your Data Lake Foundation. This discussion is now open for questions. If you missed the virtual launch event, you can view the video on demand here.

 

Meet Your Experts:

Peter+Nealy+headshot.jpg

Peter Nealy 

Director of Engineering, Platforms - EMC
Peter is a veteran storage professional with experience developing platform enablement technologies for compute and storage platforms. He has lead project development as well as lead teams responsible for delivering next generation technologies on a tight timeline and budget. Before joining EMC 5 years ago, Peter worked for IBM as a software engineer.
profile-image-display.jspa?imageID=12730&size=350

 

Consultant Product Manager - EMC
Karthik has been with Isilon for 2 years now. Prior to this, he's spent a year at Netapp and before that 12 years at Veritas, first as an engineering and then running product management for their cluster file system product. Some of Karthik's areas of expertise are as followed: Cluster file systems, virtualization, product management, scale out NAS markets, competitive views, software defined storage.
profile-image-display.jspa?imageID=8603&size=350

George Hamilton

Product Marketing, ViPR, ECS - EMC

George is a Senior Manager responsible for EMC ECS Appliance and Centera and Atmos object storage platform product marketing. He has worked in the technology industry for nearly 20 years as a product marketing manager, industry analyst, and research director. As an analyst, George covered cloud computing and services, IT infrastructure, and IT management software. George has worked for small, pre-IPO firms such as LogMeIn, boutique advisory firms like Yankee Group as well as established technology vendors EMC, CA and Sybase.

 

Moderator: Niki Vecsei 

 

This discussion takes place from Feb. 19th - March 8th. Get ready by bookmarking this page or signing up for e-mail notifications.

 

Share this event on Twitter or LinkedIn:

>> Announcing #DataLake Foundation with #EMCIsilon and #EMCECS - Join the Ask the Expert discussion! http://bit.ly/1KpJxLQ #EMCATE <<

179 Posts

February 24th, 2015 09:00

Karthik  and Peter Nealy  can you tell us more why the HD400 node requires OneFS 7.2. What are the features of OneFS 7.2. that support Data Lake Foundation workloads on HD400, such as deep archiving and instant analytics?

450 Posts

February 24th, 2015 09:00

OneFS 7.2 is the first release to add support for a couple of items required for the HD400:

1. New protection levels.  Given that each HD400 has more moving mechanical parts than any node we've ever sold before, and that customers are putting even larger volumes of data in 1 single node it was important to create new protection levels that accounted for the MTTDL of these nodes.

2. 6TB drive support. (this may have been backported to a few slightly older families, but I cannot confirm this)

3. L3 Cache in NL/HD nodes using an SSD

4. New Drive layout.  No longer are the disks simply 1-36, because of the top-down insertion they are arranged in a grid, which required changes in the OneFS UI and CLI to accomodate the grid orientation.

~Chris

1.2K Posts

February 24th, 2015 23:00

> In addition to Rob's answer, yes, a HD400-based cluster will spend proportionally more time running Flexprotect due to the increased number of drives and the relatively limited amount of compute in the platform.  But our modeling shows that it will be a manageable amount, maxing out at less than 1/3 the time (on a max-size cluster composed of only HD400 nodes)


I'd like to follow up on this. From my experience with NL400 (36x3TB) nodes MultiScan after FlexProtect and drive replacement takes about an order of magnitude longer than the FlexProtect job. What total MultiScan run times can be expected with HD400 nodes? Is there any general notion of what customers will consider acceptable?


Moreover, MultiScan jobs after cluster extensions have much more rebalancing work to do than after drive replacements, and they usually take substantially longer. Same question -- is there a target range for MultiScan run times after enlarging a HD400 pool?

Curious, and slightly sceptic 


-- Peter

125 Posts

February 25th, 2015 11:00

I do performance benchmarking for Isilon.  Strictly from a node comparison point of view, the HD400 is at parity with, or better than, the NL400 in most of my testing.  For instance, the HD's increased spindles aids it in certain sequential I/O tests, and in tests like SpecSFS it can do more raw ops/s/node than the NL.  Given that you have no existing performance issues on the 72NL, I have no concerns with you moving to the HD400 platform.

As with any potential platform/OS move, though, it's good to discuss it with your Isilon SE.  He/she will be able to help you understand the changes in OneFS 7.2 (e.g. new NFS protocol server) which may have an effect on your particular workloads...

--kip

5 Posts

February 25th, 2015 11:00

We are current customers with 5 nodes of 72NL and have been really happy with it and never had any performance problems.  Our initial thought was to replace those nodes with 400NL, but we really like the capacity options for the HD400 and wondered if you recommended that as a replacement?  I am sure it depends on our workload and it would be tough to answer, but the HD400 seems like a "deep archive" solution, interested to know if you could see it as a replacement for NL nodes.  Thanks!

19 Posts

February 25th, 2015 11:00

OneFS 7.2 is our most recent OneFS release to support our largest capacity file system. OneFS 7.2 introduces new protection models to support HD400. In addition, OneFS 7.2 brings to bare a number of enhancements in Hadoop to support Data Lakes. Ambari support, HDFS 2.2, 2.3 and 2.4 support comes in OneFS 7.2.

5 Posts

February 25th, 2015 13:00

Thank you so much for the quick reply.  We talked a bit with our sales team and there was some hemming and hawing, it is nice to hear something with authority.  I did see the notes on uNFS versus kNFS, it is very interesting and we are looking in to how that would change our existing workflow.  We are trying to virtual appliance now to test out that very thing, thank you so much for bringing it up. 

12 Posts

February 25th, 2015 13:00

Hi Peter. The performance of all LIN-based jobs, (MultiScan, Collect, AutoBalance, SmartPools, FlexProtect), is determined by multiple variables such as file size distribution, filesystem age, and cache state of the system.  We expect, on average, that these types of jobs will take longer on a cluster with many HD400s, but it is very situational.  We know that job execution time is a customer concern and have made performance improvements in all recent releases, including OneFS 7.2 to ensure it meets the needs of the platforms they run on.

99 Posts

February 25th, 2015 14:00

One thing to consider - if you have a 5-node 72NL cluster, the HD400 is not a direct replacement, since one node is nearly the same capacity (raw) as your entire cluster.  OTOH, if you are looking to archive material to the tune of 2 PB or more, then the HD400 could very well be the node of choice.  I would echo the thoughts above to chat with your Isilon SE on what options you have to replace/refresh your 72NL.  Best of luck!

5 Posts

February 25th, 2015 16:00

Thanks for the reply Rob.  We are looking to scale past the 2 PB mark, so this certainly looks like the best solution, now if we can afford it is another conversation entirely .  A quick side note, you came out to San Diego for a customer executive dinner event in 2012 and I still reference some of the things you said there.  It was an excellent talk that left an impression on me, so thank you!

1.2K Posts

February 25th, 2015 20:00

Thanks Peter, I know "it depends". Would be great to have some hallmarks for concrete scenarios though. Can you quantify  EMC's understanding of the "needs of the platforms they run on", specifically for HD400?

-- Peter

51 Posts

February 26th, 2015 00:00

Hello,

I work with the backup division, and in the past we use to have lot of problems performing NDMP restores with Isilon. The backup operation runs perfectly, so fast, but when we have to restore using NDMP, the performance is not so good.

In the latest version, with snapshots backups this has been "solved", but, would we still have problems I we continue using common NDMP?

Regards.

12 Posts

February 26th, 2015 12:00

Hi Pablo.  I had to get some consultation to answer this, so hopefully I do it justice. There are multiple use cases here that can affect the answer, and dataset plays a key part. Three-way NDMP restores are inherently slow since the data goes over the front end next. Next, if there are small files in the dataset to restore, since our writes on many small files don't perform as well as larger files it will be slower (three-way NDMP worsens this). One of things we did in OneFS 7.1.1 was introduce parallel restores which are multi threaded restores. Earlier to OneFS 7.1.1 we were doing single stream backups. Some performance evaluations have shown about a 2x performance increase with local restores using backup accelerators and the parallel restore feature.

1.2K Posts

March 1st, 2015 20:00

Would it become economic in certain scenarios to push data from Isilon to ECS via the upcoming (and already demoed in 2014) Isilon CloudPools feature?

Thank you

-- Peter

23 Posts

March 2nd, 2015 09:00

Does 7.2 bring any new reporting options? Especially around smartquotas?

No Events found!

Top