I understand if the file size is under 128kb , the file is mirrored between the nodes.
Moreover 129kb or upper, I just imagine leaves data and parity data size is consumed, which means file size and overhead
size is used.
When writing a 128kb file to Isilon and the protection level is N+2:1 or N+3:1, how the data is stored or written into the nodes?
3 nodes mirrored in N+2:1? 4 nodes mirrored in N+3:1?
Any idea? or any documentation described? Please advise me.
Thanks in advance,
The OneFS 6.5.5 User Guide (I have the 2011 version at hand)
has a nice chapter on OneFS data protection starting on page 64,
have a look at the comprehensive table of overhead rates for
the various protection schemes in the chapter N+M data protection (pp 65+66).
From the Advanced Administrations Isilon Course:
OneFS stripes data across nodes and disks. During a write, the system breaks data into
smaller logical sections called stripes and then logically places the data in a stripe unit. As
the system lays data across the cluster, it fills the stripe units until the maximum width of
the cluster is reached. Each OneFS block is 8 KB, and a stripe unit or stripe unit consists of
16 blocks, for a total of 128 KB per stripe unit.
OneFS uses advanced data layout algorithms to determine data layout for maximum
efficiency and performance. Data is evenly distributed across nodes as it is written. The
system can continuously reallocate data and make storage space more usable and efficient.
Depending on the file size and the stripe width (determined by the number of nodes), as
the cluster size increases, the system stores large files more efficiently.
Within the cluster, every disk within each node is assigned both a unique GUID and logical
drive number and is subdivided into 32MB cylinder groups comprised of 8KB blocks. Each
cylinder group is responsible for tracking, via a bitmap, whether its blocks are used for
data, inodes or other metadata constructs. The combination of node number, logical drive
number and block offset comprise a block or inode address and fall under the control of the
aptly named Block Allocation Manager (BAM).
There are four variables that combine to determine how data is laid out. This makes the
possible outcomes almost unlimited when trying to understand how the system will work.
The number of nodes in the cluster affects the data layout because data is laid out vertically
across all nodes in the cluster, then number of nodes determines how wide the strip will be.
N+M where N is the number of nodes (minus the protection level). N+M where M is the
protection level. The protection level also affects data layout because you can change the
protection level of your data down to the file level, and the protection level of that
individual file changes how it will be striped across the cluster. The file size also affects data
layout because the system employs different layout options for larger files than for smaller
files to maximize efficiency and performance. The disk access patter modifies both
prefetching and data layout settings within an individual node. Disk access pattern can be
set at a file or directory level so you are not ‘locked’ into any one pattern for the whole
Ultimately the system’s job is to lay data out in the most efficient, economical, highest
performing way possible. You can manually define some aspects of how it determines what
is “best”, but the process is designed to be automated.
An administrator from the web management or CLI interface at any point can optimize
layout decisions made by OneFS to better suit the workflow.
Note that Streaming Data Access Pattern is generally used to increase the prefetching
setting. Maximizing the most disks possible will only really benefit a smaller cluster 3-5
nodes. Most large files written to a cluster greater than 5 nodes will already be striped
across 15+ disks even with Concurrency selected. This is due to the file being large enough
to need multiple stripes and the drive rotation algorithm selecting multiple drives vs.
preferring a single drive per node, because of the need to keep drives roughly the same
Access can be set from the web administration interface or the command line. From the
command line, the disk access pattern can be set separately from the data Layout pattern
isi set -a default|streaming|random -l concurrency|streaming|random
The process of striping spreads all write operations from a client across the nodes of a
cluster. The example in this animation demonstrates how a file is broken down into chunks,
after which it is striped across disks in the cluster along with forward error correction (FEC).
Even though a client is connected to only one node, when that client saves data to the
cluster, the write operation occurs in multiple nodes in the cluster. This is also true for read
operations. A client is connected to only one node at a time, however when that client
requests a file from the cluster, the node to which the client is connected will not have the
entire file locally on its drives. The client’s node retrieves and rebuilds the file using the
backend InfiniBand network.
OneFS is designed to withstand multiple simultaneous component failures (currently four)
while still affording unfettered access to the entire file system and dataset. Data protection
is implemented at the file system level and, as such, is not dependent on any hardware
RAID controllers. This provides many benefits, including the ability add new data protection
schemes as market conditions or hardware attributes and characteristics evolve. Since
protection is applied at the file-
level, a OneFS software upgrade is all that’s required in
order to make new protection and performance schemes available.
OneFS also supports several hybrid protection schemes. These include N+2:1 and N+3:1,
which protect against two drive failures or one node failure, and three drive failures or one
node failure, respectively. These protection schemes are particularly useful for high density
node configurations, where each node contains up to thirty six, multi-terabyte SATA drives.
Here, the probability of multiple drives failing far surpasses that of an entire node failure. In
the unlikely event that multiple devices have simultaneously failed, such that the file is
“beyond its protection level”, OneFS will re
-protect everything possible and report errors on
the individual files affected to the cluster’s logs.
Interestingly, on Linux NFS3 clients (GNU) du reports sizes including the protection overhead.
Use du --apparent-size ... to see the "file" sizes on Linux.
Like du -A ... on the Isilon cluster itself.
Apple OSX clients seem to report the "file" sizes by default...