philippspohr

107 Posts

4222

October 29th, 2015 03:00

File size, parity overhead and used space

Hi Isilon community,

I am currently dealing with the Isilon topic of file size, used space, parity overhead and so on. I am involved in Isilon since a few years but I do not really understand this topic and the relationship between all involved variables. But for better understanding I give you an example with a small cluster I have recently installed and configured:

Cluster: 7 Node X410 with OneFS 7.2.0

Protection level: N+2n

At my understanding at N+2 with a seven node cluster I have maximum 5x data and exact 2x parity for each stripe, right?

Let's start with a simple example at which the stripe depth is only one because only one one stripe unit is stored at each node: 512 KB.

A 512 KB file has four 128 KB data stripe units and two 128 KB parity stripe units = total 768 KB space consumption I would suppose.

Node	Stored data
Node 1	128 KB data
Node 2	128 KB data
Node 3	128 KB data
Node 4	128 KB data
Node 5	128 KB parity
Node 6	128 KB parity
Node 7	-

To check this I created a dummy file of 512 KB with MS Windows: >> fsutil createnew dummy 524288.

The MS Windows properties of this file at the Isilon cluster shows me

Size: 512 KB (524.288 Bytes)
Size at the medium: 808 KB (827.392 Bytes)

The Isilon OneFS filesystem explorer shows me the following:

File size: 512 KB (524288 bytes)
Space used: 512 KB (524288 bytes)

Why does the file consume (from the point of MS Windows) 808 KB - and not 768 KB? And from the point of OneFS only 512 KB?

The next example is a bit more complicating because the file is too big to store only one stripe unit at each node: 1024 KB.

A 1024 KB file consists of eight 128 KB data stripe units. Because of N+2 we still have two parity stripe units a each stripe.

Node	Stored data
Node 1	128 KB data \| 128 KB data
Node 2	128 KB data \| 128 KB data
Node 3	128 KB data \| 128 KB data
Node 4	128 KB data \| 128 KB parity
Node 5	128 KB data \| 128 KB parity
Node 6	128 KB parity \| -
Node 7	128 KB parity \| -

In total sum I should have 12x128 KB = 1536 KB which has to be stored at the cluster. But the output of the MS Windows properties and the Isilon OneFS filesystem explorer are showing different values.

MS Windows

Size: 1,00 MB (1.048.576 Bytes)
Size at the medium: 1,18 KB (1.241.088 Bytes)

Isilon OneFS filesystem explorer:

File size: 1.00 MB (1048576 bytes)
Space used: 1.00 MB (1048576 bytes)

And - additionally - the well known chart with the parity overhead (found here or here) does not really match. The overhead with N+2 at a 7 node cluster should be round about 29% (5+2). But 768 KB for 512 KB is 50% overhead. And 1536 KB for 1024 KB is 50% too.

Did I make a mistake at this calculations? (The KiBi / KB conversion is insignificant.)

Maybe I'm doing a structural mistake.

Responses(4)

Peter_Sero

1.2K Posts

0

October 30th, 2015 06:00

The OneFS filesystem explorer show a bogus disk usage, it accounts only for rounding up to 8KB blocks but ignores the more important protection overhead.

For the larger file in your example, note that protection is added to *each* stripe,

so there should be 2 x (5 + 2) = 14 stripe units on disk.

I can't comment on what Windows is reporting. Better start with "du" right on the Isilon cluster for reference, and don't expect things to be too simple when file sizes grow from KB to some MB. To illustrate this, run this small script (it creates a test.dat file in the local folder which should be under /ifs), and plot the results:

=========

du-test.zsh

=========

#!/bin/zsh

for i in {1..100}

do

(( sizeKB = 32 * i ))

dd if=/dev/zero of=test.dat bs=${sizeKB}k count=1 2>/dev/null

sync

echo $(du -A -k test.dat) $(du -k test.dat) |

awk ' { fileKB=$1; diskKB=$3;

if (fileKB==0) { print "Error: File size 0KB, exiting"; exit; }

ratio=diskKB/fileKB;

overhd=100*(ratio-1);

# graphval=int(diskKB/32);

graphval=int(overhd/3);

printf "%5dKB file, %5dKB disk, %6.3f ratio, +%5.1f%% overhd, graph: |%*.*s* \n",

fileKB, diskKB, ratio, overhd, graphval, graphval, ""; }'

done

rm -f test.dat

========

For your convenience the script already makes a (lousy) attempt at plotting the overhead percentage -- which "basically" drops from around +200% down towards +50% on my 3 node pool and +2n:1 protection. See (below) how bumpy the curve is? Each bump indicates another stripe, and I think the chart explains more than an exact formula... The tables give the asymptotic values for very large files.

Cheers

-- Peter

peglarr

99 Posts

0

October 30th, 2015 07:00

Hi Phil,

The simple answer to your question is that you're forgetting about metadata. At N+2, assuming you haven't changed the default, OneFS will protect metadata one 'level' higher - i.e. N+3.

Get your Isilon SE to give you the deep dive on erasure coding and how it works. BTW the actual space consumed also depends on factors such as OneFS revision and GNA/metadata policy.

Cheers

Rob

philippspohr

107 Posts

0

October 30th, 2015 07:00

Hi Peter,

thank you for the hints.

But one think I still don't understand:

For the larger file in your example, note that protection is added to *each* stripe,

so there should be 2 x (5 + 2) = 14 stripe units on disk.

I have given two parity strip units at each stripe in the second table above - but following your explanation - why do I need total 10 data stripes although eight are enough to set up the total file of 1024 KB?

I tried it again with a 1 Byte File. For that file Windows shows the same storage size as for a 64 or 128 KB file:

Size: 1 Bytes (1 Bytes)
Size at the Medium: 404 KB (413.696 Bytes)

I will do the test with your script.

EDIT: I run your script. Here is the output:

My manually calculated file sizes are not the same, but really similar. I suppose the publishes "protection overhead" table of EMC was done with big files. With smaller files this values will not be reached.

Peter_Sero

1.2K Posts

0

October 31st, 2015 23:00

> I have given two parity strip units at each stripe in the second table above - but following your explanation - why do I need total 10 data stripes although eight are enough to set up the total file of 1024 KB?

Small stripes a built from 3 replicated units.

From a certain size on, the full stripe width for the pool is used, even when the stripe units have not yet reached 128K each. With more data being written to the file the units get uniformly filled up. That gives a different picture from what you might have in mind, like adding one full stripe unit at a time while growing (which would lead to repeatedly re-striping the full active stripe).

That was the nutshell version; as Rob suggested, becoming familiar with erase coding helps...

Cheers

-- Peter

View All

No Events found!