450 Posts

January 21st, 2015 11:00

Hi Dave,

What you're referring to is small files on Isilon.  Isilon uses an 8KB block size, and 128KB Stripe size.  If files are less than 1 stripe wide, then FEC protecting them doesn't make any sense.  So here is how we protect them.  We look at the configured protection level for the cluster/node pool in question.  The default for all clusters over 18TB in size (pretty much all clusters these days) is N+2:1 (that's the pre-7.2 nomenclature for those checking).  That means I can sustain 2 disk failures or 1 node failure.  In order to meet that level of protection, I have to make sure that I have at least 1 copy more than this protection level calls for.  So if I were to just mirror a small file, it would end up on 2 spindles, most likely on separate nodes, but this means that potentially a double disk fault if it's just the right disks could cause data loss.  So for that reason we protect small files at 1 level greater than the configured protection level, and because the overhead from the FEC calculations isn't worth the CPU cycles when we don't even fill up 1 stripe, we will simply mirror it at 3x.  What does that mean for overall storage usage?  A good example I like to use is a 50KB file.  One copy of a 50KB file on Isilon would consume 56KB of space, because we are going to align to the next 8KB block boundry.  With N+2:1 protection, we will have to 3x mirror the file, so that is 168KB of space for a 50KB file.  Isilon is far more efficient when it comes to protecting larger files. 

So your question is about how the file grows and at what point do we change from mirroring to striping, right?

That's actually a pretty easy answer, we change from mirroring to striping when the file exceeds 1 stripe, or 128KB in size. So everytime the file is saved, we'll take 3 writes, 1 to the primary file, and then 2 to the mirrored copies spread across the cluster.  When the write makes the new file size greater than 128KB upon close/save we'll perform a FEC calculation and protect it with erasure coding rather than mirroring, because of the significantly lower overhead.

As to your second question, I would suggest you open an SR with EMC Support.  If this is reproduceable it's likely that a tcpdump or some other analysis might be needed to figure out what the client is trying to do that is failing, and whether it's related to Isilon itself or to the client OS.

I hope this helps, if so, please mark your question as answered in the ECN forum. 

Thanks,

Chris R. Klosterman

Senior Solution Architect, EMC Isilon O&E Team

email: chris.klosterman@emc.com

twitter: @croaking

125 Posts

January 21st, 2015 12:00

>

So for that reason we protect small files at 1 level greater than the configured protection level

Slight clarification here (the phrase "1 level greater" to me implies something that I don't think you're intending).  OneFS protects small files at the best *equivalent* protection (to the defined protection) possible, given the number of writeable nodes in the cluster or node pool.  If your protection policy is set at +2:1, or +2d:1n in 7.2 language, a small file (small meaning <= 128 KiB) of, say, 1 KiB  is still FEC protected -- on a 3 node cluster the file will have a FEC layout of 4+2/2 (N+M/b in our terminology).  However, due to the fact that the file only has 1 KiB of content, OneFS effectively mirrors the file by creating just 3 stripe units, allocating these stripe units on separate nodes and/or drives such that the defined fault tolerance is still achievable (e.g. 1 stripe unit may go on node 1 which contains the 8 KiB block of file contents, 1 stripe unit might go on node 2 containing an 8 KiB block of protection data, and the third stripe unit might go on node 3 containing the other 8 KiB block of protection data).  This is an effective 3x mirror, if you will, but still provides protection from 1 node failure or 1 drive failure, just like the defined policy of +2:1 demands.  It's the best equivalent layout that OneFS could use.

Since there is no real mirroring happening, as the file grows OneFS simply fills in the (in my example) 4+2/2 layout with additional stripe units, and creates additional stripes when needed (as the file exceeds 512 KiB in raw size).

As a side note, directories in OneFS are always protected 1 level higher than the defined protection policy.  So in my +2:1 example, the directory that the file lives in (assuming it has the same protection as the file), would essentially be protected at 4x.

When a file is less than a certain size (the actual size escapes me right now), the file is written to a single node and then copied to other nodes for data integrity.

That's not quite how it works, as this implies there's a potential race condition that isn't there.  Files are never committed to the filesystem on a single node, ever.  OneFS uses a fairly involved two-phase commit procedure, in conjunction with the software journal and hardware NVRAM, to ensure that files are committed to all the nodes they need to be (to satisfy the defined protection) before being acknowledged back to the clients.

when they are trying to grep the file, the content is blank, zip, nadda content.

This sounds very much like a client caching issue, although I'd need more specific to know for sure.

--kip

2 Intern

 • 

306 Posts

January 27th, 2015 12:00

Ok, so I dont think the issue at hand has anything to do with the file size upon being created and continuously updated. (growing in size). 

The issue appears to be that a user,  on a Windows client,  using command prompt and trying to GREP the file (as it continues to be updated by a Linux server, sending cisco switch logs to the network share).  Files that are less than a certain size (we dont know what that is yet) are searchable.  At a certain point in file size (I'm hearing 1.2G) the file returns blank when grepping for data.

If the file is copied to local machine and thus becomes static, it is searchable, but this is not the desired action.

125 Posts

January 27th, 2015 14:00

That's suspiciously close to 2 GB, which could indicate some sort of variable overflow or similar in the client's 'grep' utility.

You say this is a Windows client -- how is the grep being done? With an actual 'grep' utility via something like Cygwin? Also, what's the Win client OS?

--kip

No Events found!

Top