Unsolved
This post is more than 5 years old
205 Posts
0
1232
7.2 frequent multiscans
Anyone else seeing more frequent multiscan jobs under 7.2? My primary cluster seems to be starting them whenever there is a drive replaced, which is definitely a change of behavior from previous versions. I'm wondering if this is intentional or or just a new piece of (minor) weirdness.
Peter_Sero
1.2K Posts
1
July 20th, 2015 19:00
Wouldn't disk stalls lead to cancellation of MultiScan jobs?
Peter_Sero
1.2K Posts
0
July 20th, 2015 19:00
These are the SSD pools... of very different sizes.... how would you like to see the balance improved here?
carlilek
205 Posts
0
July 20th, 2015 19:00
It's definitely not group changes... I know when those bastards happen on this cluster.
He says, before checking /var/log/messages... and finding a ton of stalls of one particular drive. I suspect that one will be failing soon. But there is no associated multiscan job running.
Peter_Sero
1.2K Posts
0
July 20th, 2015 22:00
Interesting... time to engage support...
In the mean time, can you check these sysctl's?
efs.bam.layout.drive_block_unbalance_threshold: 5
efs.bam.layout.drive_inode_unbalance_threshold: 5 (sic!)
And, in the spectacular output of
sysctl efs.bam.disk_pool_db
are you seeing any "used_balanced" or "free_balanced" attributes set to "false"?
carlilek
205 Posts
0
July 21st, 2015 06:00
Hmm.
The sysctls are set properly, but here's the output of hte long one, condensed:
S200s:19 has used_balanced = false under inodes on nodes 38, 73, 75
S200s:20 has used_balanced = false under inodes on nodes 37, 39, 73, 74, 76
S200s:21 has used_balanced = false under inodes on nodes 37, 41, 72, 75, 76
S200s:22 has used_balanced = false under inodes on nodes 37, 39, 40, 41, 73, 74, 75
S200-bigssd:37 has used_balanced = false under inodes on nodes 80, 81, 84, 99, 100
S200-bigssd:38 has used_balanced = false under inodes on nodes 81, 98, 99, 100
S200-bigssd:39 has used_balanced = false under inodes on nodes 79, 80, 81, 82, 83, 98, 99, 100
S210s:152 has used_balanced = false under inodes on nodes 120, 121, 122, 123, 124, 126
S210s:153 has used_balanced = false under inodes on nodes 120, 121, 122, 123, 124, 125
S210s:154 has used_balanced = false under inodes on nodes 121, 122
So this is fairly unsurprising to me, given the high churn on our S-class tier, which is, admittedly, composed of 3 disparate node types. There are Linux home directories on these nodes, as well as a fair amount of (almost) transactional data. We don't run VMware or (major) databases against it, but aside from that, they probably get hit with just about everything else. Smartpools gets run once a week to tier stuff down to the NLs. Note that none of the NLs are out of balance, and they get a lot of data written directly to them.
Peter_Sero
1.2K Posts
0
July 21st, 2015 06:00
It doesn't appear to me that the imbalances are across the pools within the same tier, but within individual disk pools.
Can you capture this sysctl once a day or so, to track changes with the Multiscan run(s)?
If there is no trend towards better balance (according to this metric), why not file the findings as a bug...