Start a Conversation

Unsolved

This post is more than 5 years old

6388

June 25th, 2014 04:00

Ask the Expert: EMC Isilon Scale-out Data Lake

Welcome to the EMC Ask the Expert discussion on Isilon following the EMC Redefine Possible announcement

YOU MAY ALSO BE INTERESTED ON THESE ATE EVENTS...

Ask the Expert: Are you ready to manage deep archiving workloads with Isilon’s HD400 node and OneFS 7.2.0? Find out more…

Ask The Expert – Isilon’s New Releases: IsilonSD Edge, OneFS.NEXT and CloudPools

https://community.emc.com/thread/216708

EMC Isilon provides an enterprise grade scale-out data lake to help protect, manage and secure all unstructured data. We are reinforcing our data lake with new announcements that include – 2 new platforms, new solutions, new access methods and SmartFlash Flash as Cache. Join us to learn more about our strategy and what’s new at Isilon.

Your hosts:

profile-image-display.jspa?imageID=10823&size=350

Nicholas Kirsch is the Chief Technology Officer and Vice President of the Isilon Storage Division at EMC. His primary focus is extending EMC Isilon's lead in Scale-Out NAS products, technologies and market solutions. Nick is currently responsible for both product and technology strategy for Isilon's integrated storage appliance and the OneFS distributed file system. He also drives advanced development and strategic acquisitions.

Nick joined Isilon Engineering in 2002 as a Software Engineer for OneFS before serving as the Director of Software Engineering through 2007. He built and led Isilon's Product Management organization as Senior Director of Product Management through 2012. Nick holds Bachelor of Science degrees in Computer Science and Mathematics from the University of Puget Sound and a Master's degree in Computer Science from the University of Washington.

This discussion will take place July 8 - 25.

Share this event on Twitter:

>> Join Ask the Expert, EMC Isilon scale-out Data Lake http://bit.ly/1mGPlqD #EMCATE <<

4 Posts

July 8th, 2014 11:00

The term data lake is new to me. Can you briefly describe it in the context of the storage industry, in general, and specifically in the context of Isilon scale-out NAS. Thanks!

83 Posts

July 9th, 2014 09:00

Hello Jim, I just noticed a Data Lake white paper that was posted yesterday:

Hope that helps

-Michael

July 13th, 2014 20:00

Jim,

(I copied this from the RSVP thread - hopefully you saw it there earlier!)

My apologies on the delay - we just finished the Seoul, Korea edition of the MegaLaunch and had a wonderful time redefining possible. As I see it in the most general context, a data lake is a shared storage infrastructure which enables a multitude of different applications and workloads to interact seamlessly. This naturally applies primarily to unstructured data (since storage systems can have knowledge of this information) and demands scalable technologies (since scale of performance and capacity are both critical requirements.)

In the context of Isilon’s Scale-Out data lake, we indeed provide a large scalable storage infrastructure for unstructured data - and more specifically, we provide seamless and shared access to applications which communicate via NFS, SMB (Windows), FTP/HTTP, HDFS (Hadoop), and (coming soon) OpenStack Swift. This is an extremely powerful combination of access methods, as it enables applications designed and written for a variety of purposes to co-exist peacefully - or more interestingly, interact (without data movement or additional copies.) Imagine (as an example) logging network access via traditional UNIX applications over NFS, using Hadoop MapReduce to find potential intrusion points, and generating graphical reports that can be viewed via a Windows workstation.

Added benefits of Isilon’s approach to the data lake is that the enterprise capabilities around storage, security, performance, and information management can all be applied uniformly to any or all of the application data. A few fun examples include the ability to provide secure multi-tenancy for Hadoop applications, de-dupe between Openstack Swift objects and NFS files, and see advanced performance details on simultaneous access (across all these protocols) to the same files!

I could go on, but I hope this explanation provides a strong foundation and sparks more curiosity.

All the best,

Nick

4 Posts

July 14th, 2014 15:00

Thanks, Michael!

4 Posts

July 14th, 2014 15:00

Thanks, Nick. Very helpful.

5 Practitioner

 • 

274.2K Posts

July 15th, 2014 10:00

Hi Nick!

It sounds like with Isilon advancements in Data Lake infrastructure, HDFS, and the upcoming Openstack Swift that unstructured data is a real growth area. Does this introduce any security complexities? What are some of the features Isilon has to help in this area?

10 Posts

July 15th, 2014 11:00

What new features in OneFS 7.1.1 are customers most excited about and which will help them swim, not sink, in the Data Lake?

1 Message

July 15th, 2014 11:00

Hi Nick

What kind of performance improvements can we expect from the SmartFlash feature? Also, in what situations would GNA be a better choice for performance over SmartFlash and vice versa?

1 Message

July 15th, 2014 13:00

How can customers best leverage the OneFS API to automate system configuration tasks?

3 Posts

July 15th, 2014 16:00

Hi Nick,

What is the preferred mechanism to backup and restore data lakes in case of a hardware failure - NDMP or Cloud back-up?

Thanks.

July 15th, 2014 19:00

Hello!

One of the many amazing things that only Isilon can do is provide a unified security context for these next-generation protocols (such as HDFS and OpenStack Swift).

This means that authentication services are provided by proven Enterprise frameworks (such as Active Directory, LDAP/Kerberos, or NIS) and permissions models are shared across emerging (HDFS/Swift) and traditional NAS (NFS/SMB) protocols.

In addition, OneFS brings support for multi-tenancy and encryption (via SEDs and local key management) easily to these emerging environments.

This makes it not only easy to expand the set of applications and workloads, create shared pipelines and workflows, but to do so in a way that will please even the most stringent security administrator.

I hope that helps!

Nick

@nkirsch

July 15th, 2014 19:00

Backup!

One of my favorite topics... Have you heard the one about the backup administrator and the bartender? Perhaps I'll leave that for Stephen Manley of EMC's DPAD group...

Back to your question. The vast majority of Isilon customers pursue a multi-pronged approach to protecting their environment. First, it is important to note that OneFS provides industry-leading protection capabilities both through advanced Reed-Solomon encodings as well as end-to-end referential integrity. This will protect customers from a variety of local hardware faults, including up to four simultaneous node failures.

Customers then take advantage of SnapshotIQ to ensure near limitless local snapshots for fast recovery from application or user errors as well as security or virus related incidents.

I often see customers leverage SyncIQ in conjunction with a secondary Isilon cluster - either for disaster recovery or business continuity. This protects customers from a complete site failure while still enabling them to quickly get back to business. SyncIQ's unique design allows that secondary environment to be used for both failover and additional production purposes and doesn't require either the same hardware or the same version of OneFS (at both sites.)

Finally, customers can deploy traditional NDMP backup, either 2-way or 3-way, and OneFS will work with all major backup providers - such as EMC Networker, CommVault Simpana, or Symantec NetBackup (to name a few.)

A full spectrum of "backup" choices - which can be combined in clever ways. For example, SyncIQ data to a secondary sight and SnapshotIQ or NDMP backup from there.

Did I mention that all of these capabilities are available regardless of which protocol - NFS, SMB, HDFS, or Swift?

Wow!

Nick

@nkirsch

July 15th, 2014 19:00

Perhaps the better question around the OneFS API is what can it not do?

So far, my list includes laundry, my favorite espresso, and weeding the garden.

The OneFS API, which is a modern, versioned RESTful interface, provides both access capabilities and management capabilities. It is the underlying control layer for nearly all of the Isilon web management and command-line tools.

The possibilities range from simple tasks, such as provisioning SMB/NFS shares, to more complex tasks such as managing per-directory SmartPool file policies. In addition, the access methods allow for directory creation and permissions management, as well as authenticated file retrieval.

I've had the pleasure of watching Isilon customers build complete self-service portals for their end-users leveraging nothing more than a web scripting language and the OneFS API.

We have a complete reference guide available if you are interested.

What will you build?

Nick

@nkirsch

July 15th, 2014 20:00

Performance improvements will always have a big caveat - "it depends" - but I will give my best answer.

In terms of raw performance numbers, a read request that could have taken as long as 7 milliseconds to service from a disk can now be serviced in almost 200 nanoseconds. That's 30 times faster!

That said, the value of SmartFlash shouldn't be measured in disk versus flash access times but rather in the amount of the application working set that can easily (and automatically) fit into the scalable, cluster-coherent cache.

SmartFlash can be scaled from a few hundred gigabytes of flash to nearly a petabyte! In addition, due to the Isilon scale-out architecture, you aren't just adding flash, you are adding CPU cores and network ports to service that flash - all while maintaining a single file system and single namespace.

We expect Isilon customers to jump at the opportunity to cost-effectively and simply scale their entire application and user experience using SmartFlash.

You asked another question about GNA (which means "global namespace acceleration".) I will simplify that to compare flash for read-caching with meta-data acceleration. SmartFlash is only for reads and can accelerate both meta-data (directory and attribute information) as well as data (file contents). Meta-data acceleration, on the other hand, can accelerate both reads and writes, but is only for accelerating information about files (rather than files themselves.)

Clearly those are fairly different use cases, and we have some great tools (such as InsightIQ) and white papers to help you make the right choice for your environment.

Vroom vroom!

Nick

@nkirsch

July 15th, 2014 20:00

I'm not much of a swimmer myself, but luckily OneFS can be a speedboat, a cargo ship, and a cruise liner all at the same time.

The things about OneFS 7.1.1 that seem to have excited customers the most are:

1) Performance! 7.1.1 includes significant boosts to both IOps (as measured by the SPEC benchmark) and throughput (large block sequential.) These performance improvements apply to existing platforms, but really shine when combined with our brand new S210 and X410 flagship products.

2) SmartFlash - a general purpose scalable read caching capability that dramatically expands the working set, taking full advantage of our scale-out CPU and networking combined with performance above. SmartFlash is also available for both new and existing clusters.

3) New access methods! Of these, SMB multi-channel is shipping today with 7.1.1 and provides a significant throughput boost for single-stream oriented applications. In some cases, we are able to exceed 4K streaming bandwidth (which is most excellent for our media and entertainment customers.) As part of the MegaLaunch, we also discussed a new version of HDFS (2.2) and our initial support for OpenStack Swift! Both of these protocols are available in a follow-on release (later this year) but we were so excited we couldn't help but talk about them.

The best news? OneFS 7.1.1 is available as a free upgrade for those customers on current maintenance contracts and is a "rolling upgrade" from OneFS 7.1.0 (which will minimize interruption and increase adoption.)

I would love to hear about anyone adopting 7.1.1 and what their favorite new capabilities are - and more importantly, what impact this release makes for their business and their lives.

Thanks!

Nick

@nkirsch

No Events found!

Top