Skip to main content
  • Place orders quickly and easily
  • View orders and track your shipping status
  • Enjoy members-only rewards and discounts
  • Create and access a list of your products
  • Manage your Dell EMC sites, products, and product-level contacts using Company Administration.

PowerScale OneFS HDFS Configuration Guide

How Hadoop is implemented on OneFS

In a Hadoop implementation on a PowerScale cluster, PowerScaleOneFS serves as the file system for Hadoop compute clients. The Hadoop distributed file system (HDFS) is supported as a protocol, which is used by Hadoop compute clients to access data on the HDFS storage layer.

Hadoop compute clients can access the data that is stored on a PowerScale cluster by connecting to any node over the HDFS protocol, and all nodes that are configured for HDFS provide NameNode and DataNode functionality as shown in the following illustration.

Figure 1. PowerScale OneFS Hadoop Deployment
EMC Isilon Hadoop Deployment

Each node boosts performance and expands the cluster's capacity. For Hadoop analytics, the PowerScale scale-out distributed architecture minimizes bottlenecks, rapidly serves Big Data, and optimizes performance.

How a PowerScaleOneFS Hadoop implementation differs from a traditional Hadoop deployment

A Hadoop implementation with OneFS differs from a typical Hadoop implementation in the following ways:

  • The Hadoop compute and HDFS storage layers are on separate clusters instead of the same cluster.
  • Instead of storing data within a Hadoop distributed file system, the storage layer functionality is fulfilled by OneFS on a PowerScale cluster. Nodes on the PowerScale cluster function as both a NameNode and a DataNode.
  • The compute layer is established on a Hadoop compute cluster that is separate from the PowerScale cluster. The Hadoop MapReduce framework and its components are installed on the Hadoop compute cluster only.
  • Instead of a storage layer, HDFS is implemented on OneFS as a native, lightweight protocol layer between the PowerScale cluster and the Hadoop compute cluster. Clients from the Hadoop compute cluster connect over HDFS to access data on the PowerScale cluster.
  • In addition to HDFS, clients from the Hadoop compute cluster can connect to the PowerScale cluster over any protocol that OneFS supports such as NFS, SMB, FTP, and HTTP. PowerScaleOneFS is the only non-standard implementation of HDFS offered that allows for multi-protocol access. PowerScale makes for an ideal alternative storage system to native HDFS by marrying HDFS services with enterprise-grade data management features.
  • Hadoop compute clients can connect to any node on the PowerScale cluster that functions as a NameNode instead of being routed by a single NameNode.

Rate this content

Accurate
Useful
Easy to understand
Was this article helpful?
0/3000 characters
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please provide ratings (1-5 stars).
  Please select whether the article was helpful or not.
  Comments cannot contain these special characters: <>()\