4 Posts

July 8th, 2014 11:00

The term data lake is new to me. Can you briefly describe it in the context of the storage industry, in general, and specifically in the context of Isilon scale-out NAS. Thanks!

83 Posts

July 9th, 2014 09:00

Hello Jim, I just noticed a Data Lake white paper that was posted yesterday:

Hope that helps

-Michael

July 13th, 2014 20:00

Jim,

(I copied this from the RSVP thread - hopefully you saw it there earlier!)

My apologies on the delay - we just finished the Seoul, Korea edition of the MegaLaunch and had a wonderful time redefining possible. As I see it in the most general context, a data lake is a shared storage infrastructure which enables a multitude of different applications and workloads to interact seamlessly. This naturally applies primarily to unstructured data (since storage systems can have knowledge of this information) and demands scalable technologies (since scale of performance and capacity are both critical requirements.)

In the context of Isilon’s Scale-Out data lake, we indeed provide a large scalable storage infrastructure for unstructured data - and more specifically, we provide seamless and shared access to applications which communicate via NFS, SMB (Windows), FTP/HTTP, HDFS (Hadoop), and (coming soon) OpenStack Swift. This is an extremely powerful combination of access methods, as it enables applications designed and written for a variety of purposes to co-exist peacefully - or more interestingly, interact (without data movement or additional copies.) Imagine (as an example) logging network access via traditional UNIX applications over NFS, using Hadoop MapReduce to find potential intrusion points, and generating graphical reports that can be viewed via a Windows workstation.

Added benefits of Isilon’s approach to the data lake is that the enterprise capabilities around storage, security, performance, and information management can all be applied uniformly to any or all of the application data. A few fun examples include the ability to provide secure multi-tenancy for Hadoop applications, de-dupe between Openstack Swift objects and NFS files, and see advanced performance details on simultaneous access (across all these protocols) to the same files!

I could go on, but I hope this explanation provides a strong foundation and sparks more curiosity.

All the best,

Nick

4 Posts

July 14th, 2014 15:00

Thanks, Michael!

4 Posts

July 14th, 2014 15:00

Thanks, Nick. Very helpful.

10 Posts

July 15th, 2014 11:00

What new features in OneFS 7.1.1 are customers most excited about and which will help them swim, not sink, in the Data Lake?

1 Message

July 15th, 2014 11:00

Hi Nick

What kind of performance improvements can we expect from the SmartFlash feature? Also, in what situations would GNA be a better choice for performance over SmartFlash and vice versa?

1 Message

July 15th, 2014 13:00

How can customers best leverage the OneFS API to automate system configuration tasks?

3 Posts

July 15th, 2014 16:00

Hi Nick,

What is the preferred mechanism to backup and restore data lakes in case of a hardware failure - NDMP or Cloud back-up?

Thanks.

July 15th, 2014 19:00

Hello!

One of the many amazing things that only Isilon can do is provide a unified security context for these next-generation protocols (such as HDFS and OpenStack Swift).

This means that authentication services are provided by proven Enterprise frameworks (such as Active Directory, LDAP/Kerberos, or NIS) and permissions models are shared across emerging (HDFS/Swift) and traditional NAS (NFS/SMB) protocols.

In addition, OneFS brings support for multi-tenancy and encryption (via SEDs and local key management) easily to these emerging environments.

This makes it not only easy to expand the set of applications and workloads, create shared pipelines and workflows, but to do so in a way that will please even the most stringent security administrator.

I hope that helps!

Nick

@nkirsch

July 15th, 2014 19:00

Backup!

One of my favorite topics... Have you heard the one about the backup administrator and the bartender? Perhaps I'll leave that for Stephen Manley of EMC's DPAD group...

Back to your question. The vast majority of Isilon customers pursue a multi-pronged approach to protecting their environment. First, it is important to note that OneFS provides industry-leading protection capabilities both through advanced Reed-Solomon encodings as well as end-to-end referential integrity. This will protect customers from a variety of local hardware faults, including up to four simultaneous node failures.

Customers then take advantage of SnapshotIQ to ensure near limitless local snapshots for fast recovery from application or user errors as well as security or virus related incidents.

I often see customers leverage SyncIQ in conjunction with a secondary Isilon cluster - either for disaster recovery or business continuity. This protects customers from a complete site failure while still enabling them to quickly get back to business. SyncIQ's unique design allows that secondary environment to be used for both failover and additional production purposes and doesn't require either the same hardware or the same version of OneFS (at both sites.)

Finally, customers can deploy traditional NDMP backup, either 2-way or 3-way, and OneFS will work with all major backup providers - such as EMC Networker, CommVault Simpana, or Symantec NetBackup (to name a few.)

A full spectrum of "backup" choices - which can be combined in clever ways. For example, SyncIQ data to a secondary sight and SnapshotIQ or NDMP backup from there.

Did I mention that all of these capabilities are available regardless of which protocol - NFS, SMB, HDFS, or Swift?

Wow!

Nick

@nkirsch

July 15th, 2014 19:00

Perhaps the better question around the OneFS API is what can it not do?

So far, my list includes laundry, my favorite espresso, and weeding the garden.

The OneFS API, which is a modern, versioned RESTful interface, provides both access capabilities and management capabilities. It is the underlying control layer for nearly all of the Isilon web management and command-line tools.

The possibilities range from simple tasks, such as provisioning SMB/NFS shares, to more complex tasks such as managing per-directory SmartPool file policies. In addition, the access methods allow for directory creation and permissions management, as well as authenticated file retrieval.

I've had the pleasure of watching Isilon customers build complete self-service portals for their end-users leveraging nothing more than a web scripting language and the OneFS API.

We have a complete reference guide available if you are interested.

What will you build?

Nick

@nkirsch

July 15th, 2014 20:00

Performance improvements will always have a big caveat - "it depends" - but I will give my best answer.

In terms of raw performance numbers, a read request that could have taken as long as 7 milliseconds to service from a disk can now be serviced in almost 200 nanoseconds. That's 30 times faster!

That said, the value of SmartFlash shouldn't be measured in disk versus flash access times but rather in the amount of the application working set that can easily (and automatically) fit into the scalable, cluster-coherent cache.

SmartFlash can be scaled from a few hundred gigabytes of flash to nearly a petabyte! In addition, due to the Isilon scale-out architecture, you aren't just adding flash, you are adding CPU cores and network ports to service that flash - all while maintaining a single file system and single namespace.

We expect Isilon customers to jump at the opportunity to cost-effectively and simply scale their entire application and user experience using SmartFlash.

You asked another question about GNA (which means "global namespace acceleration".) I will simplify that to compare flash for read-caching with meta-data acceleration. SmartFlash is only for reads and can accelerate both meta-data (directory and attribute information) as well as data (file contents). Meta-data acceleration, on the other hand, can accelerate both reads and writes, but is only for accelerating information about files (rather than files themselves.)

Clearly those are fairly different use cases, and we have some great tools (such as InsightIQ) and white papers to help you make the right choice for your environment.

Vroom vroom!

Nick

@nkirsch

July 15th, 2014 20:00

I'm not much of a swimmer myself, but luckily OneFS can be a speedboat, a cargo ship, and a cruise liner all at the same time.

The things about OneFS 7.1.1 that seem to have excited customers the most are:

1) Performance! 7.1.1 includes significant boosts to both IOps (as measured by the SPEC benchmark) and throughput (large block sequential.) These performance improvements apply to existing platforms, but really shine when combined with our brand new S210 and X410 flagship products.

2) SmartFlash - a general purpose scalable read caching capability that dramatically expands the working set, taking full advantage of our scale-out CPU and networking combined with performance above. SmartFlash is also available for both new and existing clusters.

3) New access methods! Of these, SMB multi-channel is shipping today with 7.1.1 and provides a significant throughput boost for single-stream oriented applications. In some cases, we are able to exceed 4K streaming bandwidth (which is most excellent for our media and entertainment customers.) As part of the MegaLaunch, we also discussed a new version of HDFS (2.2) and our initial support for OpenStack Swift! Both of these protocols are available in a follow-on release (later this year) but we were so excited we couldn't help but talk about them.

The best news? OneFS 7.1.1 is available as a free upgrade for those customers on current maintenance contracts and is a "rolling upgrade" from OneFS 7.1.0 (which will minimize interruption and increase adoption.)

I would love to hear about anyone adopting 7.1.1 and what their favorite new capabilities are - and more importantly, what impact this release makes for their business and their lives.

Thanks!

Nick

@nkirsch

99 Posts

July 16th, 2014 05:00

Nick - I believe you meant to say 200 microseconds, which is indeed ~30x quicker response.  Compared to 7 ms, 200 nanoseconds would be ~30,000x. 

No Events found!

Top