4 Operator

 • 

1.2K Posts

November 18th, 2015 05:00

Sad to see it hasn't worked out for you yet. If I recall correctly, your original SSDs weren't fully utilized with GNA, so using L3 cache covering also data was suggested as worthwhile considering.

Now you have increased the SSD capacity, use it for L3 metadata cache only, and find it to be overused. That's certainly strange and there should be some explanation.

When you ran SmartPools, how did the NL410 SSDs fill over time,

i.e. did you see them fill up way before the job finished?

Quick notes on the data analysis shown:

The cache age is for data in the L2 cache -- looks normal to me, given that it's just the RAM-sized cache. Nothing to worry about from the metadata side I'd say.

The L3 metadata hits... are percentages of the L3 cache read attempts.  And we don't know how many L3 attempt there are.

Can you look at the absolute L2 and L3 metadata read attempts (=starts) and hits? As discussed in an earlier thread with sluetze, if your L2 hit rate happens to be high already, L3 starts = L2 misses are comparatively few, and L3 misses even less (i.e. as percentage of all ORIGINAL requests as measured by L2 starts). Seeing the bigger picture will help.

Or -- if you manage to get an exception for using SSD for GNA with 6TB equipped nodes, let us know...

-- Peter

2 Intern

 • 

205 Posts

November 18th, 2015 08:00

But really, if it was big enough to hold all the metadata, it's still not pre-populated. So first access will always be slow. And since a percentage of the files are by definition old, when the user goes to look at them... or worse yet at a folder that contains them, it's gonna be slow.

Basically, my premise is that if combined with regular old metadata SSD patterns (GNA, metadata-read, metadata-write), L3 would make a hell of a lot of sense. Use the leftover unused space for cache. Awesome. But without it, it just results in a lot of pain for first access. I suppose if the workflows involved rarely encountered first access situations (I'm not sure what would), it wouldn't be a big deal, but as it stands, L3 becomes a very bad solution, despite how it's being pushed.

2 Intern

 • 

205 Posts

November 18th, 2015 08:00

Peter_Sero wrote:

Sad to see it hasn't worked out for you yet. If I recall correctly, your original SSDs weren't fully utilized with GNA, so using L3 cache covering also data was suggested as worthwhile considering.

Now you have increased the SSD capacity, use it for L3 metadata cache only, and find it to be overused. That's certainly strange and there should be some explanation.

We still use GNA for all of our node pools except the NL410s (by OneFS requirement). It is still wildly underused. But since L3 is a nodepool only construct, it's vastly overused on the NL410s with a ratio of 0.4% SSD to HDD. Our average metadata SSD usage is 0.46%, so not really tooooo far off--but probably off enough, especially since our total SSD ratio not counting the L3 SSDs or L3 enabled nodes is ~1.2% (yes, we have an exception for GNA).

Ah, I didn't realize the cached data age was L2 only. That does make more sense.

As far as hits vs. misses...

Screen Shot 2015-11-18 at 10.58.01 AM.png

It's ooglay.

I am not certain how the NL410 SSDs filled; that information seems to be pretty well hidden.

4 Operator

 • 

1.2K Posts

November 19th, 2015 00:00

> I am not certain how the NL410 SSDs filled; that information seems to be pretty well hidden.

Not shure about InsightIQ (waiting for our FREE license...) but in the CLI on a node with SSD:

isi statistics drive  --node --type SSD --long

There is also some historical information retained in every cluster;

let me know if you interested in using

isi statistics history ...

for that.

Have you considered populating the L3 cache by some manually treewalks? Sounds terrible, but as long as there is this gap of cache warming for internally migrated data, what else can one do? BTW, new clusters with data ingested via NFS of SMB will obviously not see this issue.

To slightly augment your whishlist:

(Dear Santa...)

- metadata on SSD as traditional

- spare SSD capacity used for read and write-back(!) cache

- SSDs on PCIe rather than sharing SAS with the HDDs

Cheers

-- Peter

PS: At least one of the 26762 sysctl's in 7.2.0.4 speaks of L3 write-back (which of course does not necessarily imply that such function will actually materialize )

2 Intern

 • 

205 Posts

November 19th, 2015 03:00

The output is suspiciously similar to that of the HDDs in those nodes:

dm11-58# isi statistics drive  --node --type SSD --longdm11-58# isi statistics drive  --node 33,38,43,44,46,59 --type SSD --long

   Drive Type OpsIn BytesIn SizeIn OpsOut BytesOut SizeOut TimeAvg Slow TimeInQ Queued Busy Used Inodes

LNN:bay        N/s     B/s      B    N/s      B/s       B      ms  N/s      ms           %    %

    33:1  SSD   1.0     11K    11K    6.8      55K    8.1K     0.4  0.0     0.0    0.0  0.2 66.3   5.2M

    38:1  SSD   1.4     13K   9.4K    6.8      54K    8.0K     0.4  0.0     0.0    0.0  0.2 66.3   5.2M

    43:1  SSD   1.4     13K   9.4K    6.8      54K    8.0K     0.4  0.0     0.0    0.0  0.0 66.3   5.2M

    44:1  SSD   2.4     26K    11K    8.4      69K    8.2K     0.4  0.0     0.0    0.0  0.2 66.3   5.2M

    46:1  SSD   0.8    6.6K   8.2K    7.0      57K    8.2K     0.5  0.0     0.0    0.0  0.1 66.3   5.2M

    59:1  SSD   1.4     11K   8.2K   12.6     102K    8.1K     0.5  0.0     0.0    0.0  0.4 66.4   5.2M

vs

dm11-58# isi statistics drive  --node 33,38,43,44,46,59 --type=sata --long

   Drive Type OpsIn BytesIn SizeIn OpsOut BytesOut SizeOut TimeAvg Slow TimeInQ Queued Busy Used Inodes

LNN:bay        N/s     B/s      B    N/s      B/s       B      ms  N/s      ms           %    %

    33:2 SATA   0.0     0.0    0.0   14.4     118K    8.2K     2.0  0.0     0.0    0.0  7.0 66.3   5.2M

    33:3 SATA   0.0     0.0    0.0   16.8     138K    8.2K     2.2  0.0     0.0    0.0  8.2 66.3   5.2M

    33:4 SATA   0.0     0.0    0.0   13.0     106K    8.2K     2.1  0.0     0.0    0.0  7.4 66.3   5.2M

    33:5 SATA   0.0     0.0    0.0   10.6      87K    8.2K     2.3  0.0     0.0    0.0  7.0 66.3   5.2M

    33:6 SATA   0.0     0.0    0.0   10.2      84K    8.2K     2.0  0.0     0.0    0.0  5.1 66.3   5.2M

    33:7 SATA   0.0     0.0    0.0   17.2     505K     29K     2.0  0.0     0.0    0.0  8.2 66.7   5.2M

4 Operator

 • 

1.2K Posts

November 19th, 2015 05:00

This is clearly unexpected... do you plan to have this checked by EMC?

2 Intern

 • 

205 Posts

November 19th, 2015 11:00

At this point, I don't even know where to point my finger. All I know is that it's slow as molasses when it's configured wrong, and as usual with EMC support, they want to see it while it's broken. I don't have the good will left in my organization to re-break it just so they can look at it.

November 20th, 2015 05:00

SSDs on the NL410 are meant to speed up job operations by doing metadata acceleration, but it is completely understood that we are unable to fit all the metadata on the single SSD. It's called L3-meta vs standard L3. Notice that HD nodes and the NL400 with 6TB drives required OneFS 7.2 because it added this L3-meta mode. L3-meta will not do any caching of data blocks.

Since you are on the NL410, I assume you are on OneFS 7.2.1, which makes L3 pools completely separate from the nodes that are being GNA accelerated (things were much worse in for GNA in 7.1.1 and 7.2) This means that GNA is only accelerating the MD on the NL nodes without SSD in them. Your NL410 6TB drives and 1 SSD are being ignored by GNA.

I would engage your account team and ask for them to research your options, because it does seem like there was a regression in functionality here. I will say though, that when you say the word performance in regards to NL nodes, I would not get your hopes up since the NL is not a performance platform.


Historically, one of the issues with GNA (before L3) is that you end up having to buy S nodes with SSD whenever you add NL nodes, which bumps up the price and complexity, so it's much cleaner to go with the X410 platform which has the SSD built in and allows you to use the traditional metadata strategies or L3.

2 Intern

 • 

205 Posts

November 20th, 2015 07:00

Hmm. There's this here "enable L3 cache" checkbox in the Smartpools interface for the NL410s. I wonder what happens if I uncheck that...

2 Intern

 • 

205 Posts

November 20th, 2015 07:00

Yes, I'm starting to work with my sales team.

Here's the issue with X410s... we need S class nodes for small file workloads, but we also need some deep storage to drop older stuff to, so this is a total regression.

I don't expect read or write performance from the NLs, of course, but with GNA I at least got namespace performance.

2 Intern

 • 

205 Posts

November 20th, 2015 13:00

I have spoken to an engineer, and we have determined the following:

If a cluster is GNA enabled AND it has nodes with L3-metadata setups (ie, NL410s or HD400s w/6TB drives)

-->Files on those 6TB nodes have their metadata stored via GNA on other nodes with SSD and that metadata is cached via L3

-->Directories on those 6TB do NOT have their metadata stored via GNA and that metadata is cached via L3

Incidentally, to see where the metadata for an object is, do an isi get -D on that object (or -Dd if it's a directory) and look for this:

*  File Data (48 bytes):

*    Metatree Depth: 1

*    78,16,988674719744:8192

*    101,16,918677561344:8192

*    121,19,74253762560:8192

*    130,34,1135611068416:8192

*    131,33,1172303634432:8192

*    133,33,1052311330816:8192

Where the first number is the devid of the node it's on, and the second is the lnum of the drive in that node that it's on. For a directory on an L3metadata node, there are no values listed under Metatree Depth: 1.

No Events found!

Top