corey_oconnor
2 Bronze

ViPR HDFS

Jump to solution

How does ViPR HDFS work?

Labels (1)
Tags (1)
0 Kudos
1 Solution

Accepted Solutions
SALAMN
2 Iron

Re: ViPR HDFS

Jump to solution

ViPR HDFS provides an HDFS-compatible file system. A ViPR client (a .jar file) installs on the data nodes of an existing Hadoop cluster and registers the ViPR File system (ViPRFS) as a file system that is available for MapReduce jobs as well as Pig, Hive queries etc. Existing storage arrays managed by ViPR can now be made accessible via ViPR HDFS.

ViPR Services create a unified pool (bucket) of data. Similar to ViPR Object, users create buckets which can span file shares that can grow and shrink on demand. The data is distributed across the arrays according to how the virtual storage pool is configured. The bucket provides an HDFS interface or, optionally, an Object (S3) and HDFS interface. In this way, the compute portion of an existing Hadoop cluster communicates with ViPR HDFS, which uses existing data (added to the HDFS bucket) as the target for Big Data applications and queries.

2 Replies
Highlighted

Re: ViPR HDFS

Jump to solution

HDFS is another access protocol/API for the ViPR Services storage engine..  (In addition to S3, Swift, Atmos, etc).  The Hadoop compute nodes can mount viprhdfs:// filesystem and access the objects that are stored in ViPR Services directly.

What specifically would you like to know?

SALAMN
2 Iron

Re: ViPR HDFS

Jump to solution

ViPR HDFS provides an HDFS-compatible file system. A ViPR client (a .jar file) installs on the data nodes of an existing Hadoop cluster and registers the ViPR File system (ViPRFS) as a file system that is available for MapReduce jobs as well as Pig, Hive queries etc. Existing storage arrays managed by ViPR can now be made accessible via ViPR HDFS.

ViPR Services create a unified pool (bucket) of data. Similar to ViPR Object, users create buckets which can span file shares that can grow and shrink on demand. The data is distributed across the arrays according to how the virtual storage pool is configured. The bucket provides an HDFS interface or, optionally, an Object (S3) and HDFS interface. In this way, the compute portion of an existing Hadoop cluster communicates with ViPR HDFS, which uses existing data (added to the HDFS bucket) as the target for Big Data applications and queries.