BBTower_masahir

21 Posts

1

10226

March 21st, 2013 09:00

How many sizes does Isilon consume when writing 128kb file with N+2:1 and N+3:1

Hi guys,

I understand if the file size is under 128kb , the file is mirrored between the nodes.

Moreover 129kb or upper, I just imagine leaves data and parity data size is consumed, which means file size and overhead

size is used.

When writing a 128kb file to Isilon and the protection level is N+2:1 or N+3:1, how the data is stored or written into the nodes?

3 nodes mirrored in N+2:1? 4 nodes mirrored in N+3:1?

Any idea? or any documentation described? Please advise me.

Thanks in advance,

Masahiro Uchida

Responses(4)

Peter_Sero

1.2K Posts

2

March 24th, 2013 23:00

The OneFS 6.5.5 User Guide (I have the 2011 version at hand)

has a nice chapter on OneFS data protection starting on page 64,

have a look at the comprehensive table of overhead rates for

the various protection schemes in the chapter N+M data protection (pp 65+66).

Peter

A

AMMSCL

4 Posts

3

March 27th, 2013 06:00

From the Advanced Administrations Isilon Course:

OneFS stripes data across nodes and disks. During a write, the system breaks data into

smaller logical sections called stripes and then logically places the data in a stripe unit. As

the system lays data across the cluster, it fills the stripe units until the maximum width of

the cluster is reached. Each OneFS block is 8 KB, and a stripe unit or stripe unit consists of

16 blocks, for a total of 128 KB per stripe unit.

OneFS uses advanced data layout algorithms to determine data layout for maximum

efficiency and performance. Data is evenly distributed across nodes as it is written. The

system can continuously reallocate data and make storage space more usable and efficient.

Depending on the file size and the stripe width (determined by the number of nodes), as

the cluster size increases, the system stores large files more efficiently.

Within the cluster, every disk within each node is assigned both a unique GUID and logical

drive number and is subdivided into 32MB cylinder groups comprised of 8KB blocks. Each

cylinder group is responsible for tracking, via a bitmap, whether its blocks are used for

data, inodes or other metadata constructs. The combination of node number, logical drive

number and block offset comprise a block or inode address and fall under the control of the

aptly named Block Allocation Manager (BAM).

There are four variables that combine to determine how data is laid out. This makes the

possible outcomes almost unlimited when trying to understand how the system will work.

The number of nodes in the cluster affects the data layout because data is laid out vertically

across all nodes in the cluster, then number of nodes determines how wide the strip will be.

N+M where N is the number of nodes (minus the protection level). N+M where M is the

protection level. The protection level also affects data layout because you can change the

protection level of your data down to the file level, and the protection level of that

individual file changes how it will be striped across the cluster. The file size also affects data

layout because the system employs different layout options for larger files than for smaller

files to maximize efficiency and performance. The disk access patter modifies both

prefetching and data layout settings within an individual node. Disk access pattern can be

set at a file or directory level so you are not ‘locked’ into any one pattern for the whole

system.

Ultimately the system’s job is to lay data out in the most efficient, economical, highest

performing way possible. You can manually define some aspects of how it determines what

is “best”, but the process is designed to be automated.

An administrator from the web management or CLI interface at any point can optimize

layout decisions made by OneFS to better suit the workflow.

Note that Streaming Data Access Pattern is generally used to increase the prefetching

setting. Maximizing the most disks possible will only really benefit a smaller cluster 3-5

nodes. Most large files written to a cluster greater than 5 nodes will already be striped

across 15+ disks even with Concurrency selected. This is due to the file being large enough

to need multiple stripes and the drive rotation algorithm selecting multiple drives vs.

preferring a single drive per node, because of the need to keep drives roughly the same

capacity.

Access can be set from the web administration interface or the command line. From the

command line, the disk access pattern can be set separately from the data Layout pattern

isi set -a default|streaming|random -l concurrency|streaming|random

The process of striping spreads all write operations from a client across the nodes of a

cluster. The example in this animation demonstrates how a file is broken down into chunks,

after which it is striped across disks in the cluster along with forward error correction (FEC).

Even though a client is connected to only one node, when that client saves data to the

cluster, the write operation occurs in multiple nodes in the cluster. This is also true for read

operations. A client is connected to only one node at a time, however when that client

requests a file from the cluster, the node to which the client is connected will not have the

entire file locally on its drives. The client’s node retrieves and rebuilds the file using the

backend InfiniBand network.

OneFS is designed to withstand multiple simultaneous component failures (currently four)

while still affording unfettered access to the entire file system and dataset. Data protection

is implemented at the file system level and, as such, is not dependent on any hardware

RAID controllers. This provides many benefits, including the ability add new data protection

schemes as market conditions or hardware attributes and characteristics evolve. Since

protection is applied at the file-

level, a OneFS software upgrade is all that’s required in

order to make new protection and performance schemes available.

OneFS also supports several hybrid protection schemes. These include N+2:1 and N+3:1,

which protect against two drive failures or one node failure, and three drive failures or one

node failure, respectively. These protection schemes are particularly useful for high density

node configurations, where each node contains up to thirty six, multi-terabyte SATA drives.

Here, the probability of multiple drives failing far surpasses that of an entire node failure. In

the unlikely event that multiple devices have simultaneously failed, such that the file is

“beyond its protection level”, OneFS will re

-protect everything possible and report errors on

the individual files affected to the cluster’s logs.

B

BBTower_masahir

21 Posts

0

March 28th, 2013 23:00

Thanks!

I tried to find the usage of cluster using du command via NFS client.

Peter_Sero

1.2K Posts

0

March 31st, 2013 06:00

Interestingly, on Linux NFS3 clients (GNU) du reports sizes including the protection overhead.

Use du --apparent-size ... to see the "file" sizes on Linux.

Like du -A ... on the Isilon cluster itself.

Apple OSX clients seem to report the "file" sizes by default...

Peter

View All

No Events found!