carlilek

205 Posts

1232

July 16th, 2015 10:00

7.2 frequent multiscans

Anyone else seeing more frequent multiscan jobs under 7.2? My primary cluster seems to be starting them whenever there is a drive replaced, which is definitely a change of behavior from previous versions. I'm wondering if this is intentional or or just a new piece of (minor) weirdness.

Responses(21)

Peter_Sero

1.2K Posts

1

July 20th, 2015 19:00

Wouldn't disk stalls lead to cancellation of MultiScan jobs?

Peter_Sero

1.2K Posts

0

July 20th, 2015 19:00

These are the SSD pools... of very different sizes.... how would you like to see the balance improved here?

carlilek

205 Posts

0

July 20th, 2015 19:00

It's definitely not group changes... I know when those bastards happen on this cluster.

He says, before checking /var/log/messages... and finding a ton of stalls of one particular drive. I suspect that one will be failing soon. But there is no associated multiscan job running.

Peter_Sero

1.2K Posts

0

July 20th, 2015 22:00

Interesting... time to engage support...

In the mean time, can you check these sysctl's?

efs.bam.layout.drive_block_unbalance_threshold: 5

efs.bam.layout.drive_inode_unbalance_threshold: 5 (sic!)

And, in the spectacular output of

sysctl efs.bam.disk_pool_db

are you seeing any "used_balanced" or "free_balanced" attributes set to "false"?

carlilek

205 Posts

0

July 21st, 2015 06:00

Hmm.

The sysctls are set properly, but here's the output of hte long one, condensed:

S200s:19 has used_balanced = false under inodes on nodes 38, 73, 75

S200s:20 has used_balanced = false under inodes on nodes 37, 39, 73, 74, 76

S200s:21 has used_balanced = false under inodes on nodes 37, 41, 72, 75, 76

S200s:22 has used_balanced = false under inodes on nodes 37, 39, 40, 41, 73, 74, 75

S200-bigssd:37 has used_balanced = false under inodes on nodes 80, 81, 84, 99, 100

S200-bigssd:38 has used_balanced = false under inodes on nodes 81, 98, 99, 100

S200-bigssd:39 has used_balanced = false under inodes on nodes 79, 80, 81, 82, 83, 98, 99, 100

S210s:152 has used_balanced = false under inodes on nodes 120, 121, 122, 123, 124, 126

S210s:153 has used_balanced = false under inodes on nodes 120, 121, 122, 123, 124, 125

S210s:154 has used_balanced = false under inodes on nodes 121, 122

So this is fairly unsurprising to me, given the high churn on our S-class tier, which is, admittedly, composed of 3 disparate node types. There are Linux home directories on these nodes, as well as a fair amount of (almost) transactional data. We don't run VMware or (major) databases against it, but aside from that, they probably get hit with just about everything else. Smartpools gets run once a week to tier stuff down to the NLs. Note that none of the NLs are out of balance, and they get a lot of data written directly to them.

Peter_Sero

1.2K Posts

0

July 21st, 2015 06:00

It doesn't appear to me that the imbalances are across the pools within the same tier, but within individual disk pools.

Can you capture this sysctl once a day or so, to track changes with the Multiscan run(s)?

If there is no trend towards better balance (according to this metric), why not file the findings as a bug...

1
2

View All

No Events found!

Isilon

7.2 frequent multiscans