Start a Conversation

This post is more than 5 years old

Solved!

Go to Solution

1691

June 28th, 2013 14:00

What benefits does Isilon bring over HDFS running on compute hosts?

What benefits does Isilon bring over HDFS running on compute hosts?

September 24th, 2013 13:00

Isilon has a FANTASTIC Hadoop story! 

As the other responders identify, Hadoop is designed to duplicate data so the data can be processed by multiple servers at the same time in an effort to deal with Big Data; the old divided and conquer approach.  Some time ago, Isilon developed a very efficient and robust file system called OneFS.  In a effort of amazing insight, Isilon engineers turned the activities of HDFS (Hadoop Distributed File System) into a protocol.  That’s right, Isilon sees HDFS as a protocol like CIFS, NFS, or FTP.  OneFS is already an amazing file system, so if it could speak Hadoop, you would have the best of both worlds; the power of Hadoop with a very resilient time tempered scale-out file system. 

 

Essentially, Isilon satisfies Hadoop’s need to see multiple copies of the same data without having to actually copy it.  That one "simple" innovation is very significant because the data can be analyzed immediately without the need to replicate it. That means, servers do not need to be purchased for the sole purpose of simply hosting the DAS instances.  In other words, with Hadoop on Isilon,  you basically only need to buy the servers required to analyze the data, not house the replicas.  In addition, because you no longer analyze copies of the data, you can actually analyze active data quickly and frequently.  We call it Analyze-in-place or Hadoop-in-place.  PetersonRyan  likes to say that Isilon "Dedupes Hadoop".  Actually he says "Hadoop Dedupe" a lot, but we are trying to not encourage him.

 

Isilon has developed some early Total Cost of Ownership tools that help reveal the significant benefits of HDFS on Isilon.  The savings in electricity alone is staggering.  Not only that, if you already own Isilon, virtualizing Hadoop (vHadoop) and getting started is fast, easy, and FREE.  Check out EMC’s Hadoop Starter Kit: https://community.emc.com/docs/DOC-26892

I could go on for days regarding the benefits of Isilon for Hadoop like Geo Replication, high availability, etc. but I hope this little taste helps.

B. Scott...

12 Posts

June 28th, 2013 14:00

I wanted to use an example here, as I have recently run across a customer who liked HDFS from the standpoint of they could build a large scale storage solution at low cost.  However in discussions with them, they are a Tier 2 site for CERN, we discovered three things.  Strike one: failure of their name node had actually caused downtime.  Strike two: they had to scale compute to scale storage even when CPU was not actually saturated.  Strike three they were maintaining three copies of data for some sets, 3TB raw working out to 900GB usable, but in some instances had 5 copies of data.  If these apply to you Isilon addresses the single point of failure of name nodes, allows independent scale of storage OR compute, and finally with N2:1 protection provides magnitures better utilization percentages while allowing higher protection on granular basis!

11 Posts

July 2nd, 2013 11:00

...and one additional advantage that is often overlooked: In traditional HDFS environments, the entire data set must be ingested (and three or more copies of each block made) BEFORE any computations may begin, and results must be exported afterwords. With Isilon HDFS, the entire data set is available immediately for computation, and the results are also available immediately to NFS and SMB clients.

5 Practitioner

 • 

274.2K Posts

September 10th, 2013 07:00

In addition to the previous responses, on Isilon you can access the same data using HDFS 1.0 and HDFS 2.0.  So if your organization has multiple Hadoop distributions using both protocols this could be very useful.

5 Practitioner

 • 

274.2K Posts

September 23rd, 2013 12:00


Another key benefit for Isilon and HDFS is the Enterprise feature set, since HDFS does not have replication or smartpools and other data management capabilities. You can leverage these features for async replication for disaster recovery or utilize smartpools to archive old job runs or data to a lower tier that the customer may need to reference at a later date. SInce it will take some time for the HDFS/Hadoop community to develop these features, this gives us a great advantage.

1.2K Posts

September 25th, 2013 01:00

Hello Scott,

is there a compatibility matrix for Isilon + various HDFS clients?

Just remembering this guy having issues with HBase:

Problem while running HBase over Isilon

-- Peter

5 Practitioner

 • 

274.2K Posts

September 26th, 2013 18:00

Also of note one of the benefits of Using Isilon with Hadoop is the flexibility you get with the data. Backing up a Hadoop cluster can be a impossible task in some traditional environments. In fact a lot of places just rely on the Hadoop resiliency that is built in to the cluster as being good enough. In a Isilon world that data is accessible via NFS or SMB too so you can just back it up with a tradition file backup if you want.

Not saying that works in every instance but from my past experience it would definitely have been a option for us.

November 14th, 2013 15:00

Hey Peter,

Yes there is.  The distributions are listed on page 10 in the Supportability and Compatibility Guide:

https://support.emc.com/docu44518_Isilon_Supportability_and_Compatibility_Guide_.pdf?language=en_US

No Events found!

Top