Highlighted
RobertoAraujo1
3 Cadmium

Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions

Welcome to this EMC Ask the Expert session. On this occasion we'll answer questions on EMC Big Data Solutions such as Isilon, ViPR, and the ECS Appliance.


EMC has always been about data; storage is just the means to keep, access, protect and use it. Big Data is the latest data management challenge. That’s why EMC was so excited to be at Hadoop World and showcase our storage and data management solutions for Big Data. EMC does not just tackle storage problems, we solve data management challenges. Our experts are getting ready to take all of your questions on this topic.

Here are your Subject Matter Experts:


profile-image-display.jspa?imageID=8603&size=350

George Hamilton is a senior product marketing manager for EMC ViPR Global Data Services and EMC Centera and Atmos object storage platforms. George has nearly 20 year of technology industry experience. Prior to joining EMC, he was an industry analyst and research director for Yankee Group covering cloud computing and services, IT infrastructure and IT management software. Connect with George on Twitter.

profile-image-display.jspa?imageID=10367&size=350

Ryan M Peterson MBA is an internationally recognized industry expert with repeated success in the design, development, and delivery of ground-breaking, high performance technology solutions. He is a technology thought leader and pioneer with a “can do” attitude, who finds workable, technologically advanced solutions to complex issues. Ryan currently directs the efforts of EMC Isilon’s Solutions Architecture organization focusing on industry integration with the best of breed applications and technologies offered. He has also become recognized as a thought leader within the area of Big Data Analytics applications such as Hadoop and enjoys discussing the future of technology and its positive impact to the world. Connect with Ryan on Twitter


This Event will take place from October 27 - November 7th, 2014.


Share this event on Twitter:

>> Join our Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions http://bit.ly/1CYVHrg #EMCATE <<

Labels (2)
12 Replies
RobertoAraujo1
3 Cadmium

Re: Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions

Welcome everyone, this ATE session has began. Our experts are now ready to answer any question you post on this thread. Enjoy!

0 Kudos
helghareeb
1 Copper

Re: Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions

Hello Everyone,

Thanks for the opportunity. I have more than question, hope this is not a problem.

  • Big Data Questions:
    • Assuming we are using Hadoop for Big Data Analytics, what are the advantages of using OneFS against Hadoop HDFS?
    • Are there any, in the near future, alternatives to Hadoop? Is there any coming advances to Map-Reduce?
  • Object Storage Questions:
    • Up till now, I am not able to understand how Object Storage works. For example, I understand that NTFS stores larger sizes files than FAT, because of the large iNode size. For Object Storage, there are no iNodes. So, how is file written to disk? I mean what exactly is the change that permitted capability to write large files?
    • Besides, if Object Storage calculates the ID from the content, and stores authorization data within the stored object. That means everyone with access to Object Storage can find the object but only authorized users can access it. Is that true?
    • In case what I am thinking is true, isn't this a security leakage?

Sorry for the multiple questions. I understand I might be needing to learn more about the technologies. However, I find it an excellent chance to get replies from the experts. Replies may include references to external resources. Thanks again.

0 Kudos
Nikschen
3 Argentium

Re: Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions

Hi Haitham,

I will let the Experts reply to your questions, but let me recommend the latest October blog posts on the Isilon Community to you about OneFS and Hadoop, and Splunk as an alternative to Hadoop.

Isilon

Happy reading.

Niki

0 Kudos
PetersonRyan
1 Copper

Re: Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions

helghareeb
1 Copper

Re: Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions

Hello Ryan Peterson

Honestly, those are the answers I was looking for. You really have no idea how much you helped me with your answer. Thank you for the thorough answer, and the effort and time you took to provide this answer. Actually, I have more questions, hope you don't mind.

  • How OneFS provides protection, if not through multiple-write (3x in case of HDFS)?
  • Is there any technical documentation about OneFS techniques when compared to another file systems? I know OneFS is a proprietary File System, but I hope you have a blog or something that refers to such topics.
  • I hope you can share any of your thoughful ideas about "Object Storage". They are really helpful. If you can shed the light on differences between OneFS and Swift (Open Stack) that would be awesome. Even if not, your answer really helped me.

Sorry for the questions. I am in academia, and I am starting my research career in "File Systems". A topic I find very useful actually. Sorry for any inconvenience and thanks for your generosity.

0 Kudos
eghamilton
2 Bronze

Re: Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions

Hi Haithem. Thanks for your question. Let me see if I can address the object portion.

Regardless of whether a file is stored as an object, file or a block, it is technically stored as a contiguous block of data on a disk. File/NAS storage and Object storage are simply abstractions above that process. As far as handling large files, that is precisely what Object storage is designed for. Rather than using a file system with a hierarchical structure, Object stores a file  with both the metadata and raw data packaged together as a unique object. This object is then stamped with a unique identifier and placed in a non-hierarchical bucket. It seems as though you are referencing Content Addressed Storage (CAS). Centera is an example of CAS. With Centera, the application requests to create a new file and the app server sends the file to Centera. Centera performs  the Content Address calculation using a proprietary hash and  sends the address back to application. The application database stores the content address for future reference. The content address is a unique, digital fingerprint that guarantees content authenticity and immutability. When an application needs to access the file, the application only needs to know the content address. The authorization data is not stored within the object. That is governed by the application and the user's privileges at the application layer.

Other object platforms work similarly but use different methods of creating the unique object ID which is stored in an index.

As far as security, each operation is individually authenticated. So, if a user is not authenticated, they will not have permission to access a file. Again, this is done at the application and access control layer.

Access to object storage is via an API, most often a restful API such as Amazon S3,. EMC Atmos or OpenStack Swift.

For a more detailed explanation, here are a few resources for you:

ViPR Services Storage Engine Architecture White Paper

EMC Atmos Cloud Storage Architecture

0 Kudos
PetersonRyan
1 Copper

Re: Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions

Hi Haitham,

Here is a link to the Isilon community page where you can learn all about Isilon Erasure Codes and has other Isilon documentation: https://community.emc.com/community/products/isilon

Basicaly speaking, Isilon doesnt look to protect the underlying disks in a system, but instead manages data protection by distributing blocks and parity blocks across the cluster.  Although a little more complicated than this, imagine you have three Isilon nodes and two held the data, the third would hold the parity.  This allows the loss of physcial disks or even entire nodes without loss of data.  You will see terminology such as n+1 or n+3:1 thrown around when talking to Isilon folks.  What they are saying is that for each file, directory, or system level (the single volume), you can set the protection level differently.  n+3:1 for example would say to the system to take those files and write enough extra parity to be able to simultaenously lose three disks in the cluster or a single node.  n+3:2 would mean 3 disks or 2 nodes.  n+1 would mean 1 disk or 1 node.  We typically suggest n+2:1 as a default.  Not to overly confuse anything, but technically we also have a data protection schema that allows for data mirroring similar to the way Hadoop does replicas, but we seldom utilize it.

For OpenStack SWIFT, you are talking about an object API that allows access to data in the form of metadata and data wrapped together as George has discussed in his response.  Isilon is releasing access to the underlying file data in the form of SWIFT in the upcoming release slated for next week.  This will allow you to use Openstack SWIFT APIs to get to the same data as you could using SMB, NFS, or otherwise.  Think of SWIFT as using GET() and PUT() to the data.

I hope that helps.  Please continue the questions, they are quite good ones!

Ryan

JamieD73
2 Iron

Re: Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions

I understand that one of the questions consistently asked at the Hadoop World booth was "Which Hadoop distribution should I choose?"  

How do our Big Data experts respond to that?

0 Kudos
helghareeb
1 Copper

Re: Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions

Ryan Peterson George

Thank you so much for the shared answers. You are truly the experts. I appreciate your time.

It is really difficult to find great answers online without directly talking to the experts.

Can't wait for the next "Ask the Expert" to take questions to the next level. 

Just after I finish checking the useful resources you have provided.

Thanks

Haitham