RobertoAraujo1

2 Intern

•

718 Posts

2

8783

October 9th, 2014 14:00

Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions

Welcome to this EMC Ask the Expert session. On this occasion we'll answer questions on EMC Big Data Solutions such as Isilon, ViPR, and the ECS Appliance.

EMC has always been about data; storage is just the means to keep, access, protect and use it. Big Data is the latest data management challenge. That’s why EMC was so excited to be at Hadoop World and showcase our storage and data management solutions for Big Data. EMC does not just tackle storage problems, we solve data management challenges. Our experts are getting ready to take all of your questions on this topic.

Here are your Subject Matter Experts:

George Hamilton is a senior product marketing manager for EMC ViPR Global Data Services and EMC Centera and Atmos object storage platforms. George has nearly 20 year of technology industry experience. Prior to joining EMC, he was an industry analyst and research director for Yankee Group covering cloud computing and services, IT infrastructure and IT management software. Connect with George on Twitter.

Ryan M Peterson MBA is an internationally recognized industry expert with repeated success in the design, development, and delivery of ground-breaking, high performance technology solutions. He is a technology thought leader and pioneer with a “can do” attitude, who finds workable, technologically advanced solutions to complex issues. Ryan currently directs the efforts of EMC Isilon’s Solutions Architecture organization focusing on industry integration with the best of breed applications and technologies offered. He has also become recognized as a thought leader within the area of Big Data Analytics applications such as Hadoop and enjoys discussing the future of technology and its positive impact to the world. Connect with Ryan on Twitter

This Event will take place from October 27 - November 7th, 2014.

Share this event on Twitter:

>> Join our Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions http://bit.ly/1CYVHrg #EMCATE <<

Responses(10)
Solutions(0)

RobertoAraujo1

2 Intern

•

718 Posts

0

October 27th, 2014 08:00

Welcome everyone, this ATE session has began. Our experts are now ready to answer any question you post on this thread. Enjoy!

helghareeb

1 Rookie

•

19 Posts

0

October 27th, 2014 23:00

Hello Everyone,

Thanks for the opportunity. I have more than question, hope this is not a problem.

Big Data Questions:
- Assuming we are using Hadoop for Big Data Analytics, what are the advantages of using OneFS against Hadoop HDFS?
- Are there any, in the near future, alternatives to Hadoop? Is there any coming advances to Map-Reduce?
Object Storage Questions:
- Up till now, I am not able to understand how Object Storage works. For example, I understand that NTFS stores larger sizes files than FAT, because of the large iNode size. For Object Storage, there are no iNodes. So, how is file written to disk? I mean what exactly is the change that permitted capability to write large files?
- Besides, if Object Storage calculates the ID from the content, and stores authorization data within the stored object. That means everyone with access to Object Storage can find the object but only authorized users can access it. Is that true?
- In case what I am thinking is true, isn't this a security leakage?

Sorry for the multiple questions. I understand I might be needing to learn more about the technologies. However, I find it an excellent chance to get replies from the experts. Replies may include references to external resources. Thanks again.

Nikschen

179 Posts

0

October 28th, 2014 09:00

Hi Haitham,

I will let the Experts reply to your questions, but let me recommend the latest October blog posts on the Isilon Community to you about OneFS and Hadoop, and Splunk as an alternative to Hadoop.

Isilon

Happy reading.

Niki

helghareeb

1 Rookie

•

19 Posts

0

October 30th, 2014 11:00

Hello Ryan Peterson

Honestly, those are the answers I was looking for. You really have no idea how much you helped me with your answer. Thank you for the thorough answer, and the effort and time you took to provide this answer. Actually, I have more questions, hope you don't mind.

How OneFS provides protection, if not through multiple-write (3x in case of HDFS)?
Is there any technical documentation about OneFS techniques when compared to another file systems? I know OneFS is a proprietary File System, but I hope you have a blog or something that refers to such topics.
I hope you can share any of your thoughful ideas about "Object Storage". They are really helpful. If you can shed the light on differences between OneFS and Swift (Open Stack) that would be awesome. Even if not, your answer really helped me.

Sorry for the questions. I am in academia, and I am starting my research career in "File Systems". A topic I find very useful actually. Sorry for any inconvenience and thanks for your generosity.

eghamilton

28 Posts

0

October 30th, 2014 15:00

Hi Haithem. Thanks for your question. Let me see if I can address the object portion.

Regardless of whether a file is stored as an object, file or a block, it is technically stored as a contiguous block of data on a disk. File/NAS storage and Object storage are simply abstractions above that process. As far as handling large files, that is precisely what Object storage is designed for. Rather than using a file system with a hierarchical structure, Object stores a file with both the metadata and raw data packaged together as a unique object. This object is then stamped with a unique identifier and placed in a non-hierarchical bucket. It seems as though you are referencing Content Addressed Storage (CAS). Centera is an example of CAS. With Centera, the application requests to create a new file and the app server sends the file to Centera. Centera performs the Content Address calculation using a proprietary hash and sends the address back to application. The application database stores the content address for future reference. The content address is a unique, digital fingerprint that guarantees content authenticity and immutability. When an application needs to access the file, the application only needs to know the content address. The authorization data is not stored within the object. That is governed by the application and the user's privileges at the application layer.

Other object platforms work similarly but use different methods of creating the unique object ID which is stored in an index.

As far as security, each operation is individually authenticated. So, if a user is not authenticated, they will not have permission to access a file. Again, this is done at the application and access control layer.

Access to object storage is via an API, most often a restful API such as Amazon S3,. EMC Atmos or OpenStack Swift.

For a more detailed explanation, here are a few resources for you:

ViPR Services Storage Engine Architecture White Paper

EMC Atmos Cloud Storage Architecture

JamieD73

32 Posts

0

November 5th, 2014 13:00

I understand that one of the questions consistently asked at the Hadoop World booth was "Which Hadoop distribution should I choose?"

How do our Big Data experts respond to that?

helghareeb

1 Rookie

•

19 Posts

0

November 9th, 2014 20:00

Ryan Peterson George

Thank you so much for the shared answers. You are truly the experts. I appreciate your time.

It is really difficult to find great answers online without directly talking to the experts.

Can't wait for the next "Ask the Expert" to take questions to the next level.

Just after I finish checking the useful resources you have provided.

Thanks

Haitham

eghamilton

28 Posts

0

November 10th, 2014 06:00

Which Hadoop distribution should you use? In the case of ViPR HDFS, EMC gives you the option to choose the Hadoop distribution that best fits your needs. ViPR Services is an object-based unstructured storage engine. ViPR Services supports access to the underlying data via Object APIs such as S3, OpenStack Swift and EMC Atmos. It also provides an HDFS interface to an object bucket. ViPR presents an HDFS-compatible file system. ViPR HDFS provides a client library (ViPR-HDFS Client) that is installed on all the data nodes that run MR jobs on the customer’s Hadoop cluster. As such, the customer can use the distribution of their choice.

When a task running on the datanode needs to read a file, the request will go to the ViPR-HDFS client (the customer will point to viprfs:// as their data source) and the ViPR client will communicate with the HDFS head on the ViPR data node. The ViPR client passes in a authN token that identifies the user to the HDFS Head.

The HDFS head in the ViPR Data node receives requests from the ViPR-HDFS client . The HDFS Head then verifies the user’s identity by authenticating against the KDC. Then it talks to the ViPR Services engine and the controller process running on the node to fetch the requested data once authN and authZ succeed.

Bottom line, the goal of ViPR HDFS is to extend analytic capabilities to additional data sources, for example, a large, PB-scale archive for metadata querying, etc. But you can use your existing Hadoop distribution.

eghamilton

28 Posts

0

November 10th, 2014 06:00

Thank you Haitham. We appreciate your participation. The community is only as valuable as the participants!

RobertoAraujo1

2 Intern

•

718 Posts

0

November 10th, 2014 06:00

This ATE events has ended. We would like to thank all those who participated in this discussion, but special thanks to our experts who took their time from their busy schedule to answers our user's questions.

Cheers.

View All

No Events found!

Big Data General

Ask the Expert: Store Everything, Analyze Everything, and Build What You Need with EMC Hadoop Storage Solutions

Was this post helpful?