carlilek

205 Posts

1300

July 16th, 2015 10:00

7.2 frequent multiscans

Anyone else seeing more frequent multiscan jobs under 7.2? My primary cluster seems to be starting them whenever there is a drive replaced, which is definitely a change of behavior from previous versions. I'm wondering if this is intentional or or just a new piece of (minor) weirdness.

Responses(21)

johnsonka

130 Posts

0

July 16th, 2015 12:00

Hello carlilek,

The policy for MultiScan does not appear to have changed recently based on the last update to our Job Engine White Paper:

https://support.emc.com/docu51125_White-Paper:-Isilon-OneFS-Job-Engine.pdf?language=en_US

According to this document, the job would run on any device add group change, and a drive add would meet this condition. MultiScan will also be scheduled if: the cluster is out of balance by more than 5% or there has not been a successful MultiScan in 14 days (per the AutoBalance and Collect run policies).

So far I have not been able to find anything changed in our tracking system; however, it is not out of the realm of possibility that it was noticed this was not working as best it could and was fixed along with other jobengine improvements.

carlilek

205 Posts

0

July 19th, 2015 08:00

OK.... now it's started another one out of the blue. No drive replacements, and it's not out of balance:

dm11-29# isi storagepool list

Name Nodes Requested Protection HDD Total % SSD Total %

------------------------------------------------------------------------------------------------------------------------------------

perftier 1-7,10,18,26-28,31-32,35-37,39-40,45,64-69 - 134.5374T 308.5540T 43.60% 6.6928T 43.5456T 15.37%

- S200s 10,18,31-32,35-37,64-66 +2n 51.2356T 118.8580T 43.11% 642.189G 3.5883T 17.48%

- S200-bigssd 26-28,39-40,45,67-69 +2n 37.5507T 87.5227T 42.90% 1.6810T 9.6885T 17.35%

- S210s 1-7 +2n 45.7512T 102.1733T 44.78% 4.3847T 30.2688T 14.49%

nltier 8-9,11-17,19-25,29-30,34,41-42,48-55,70-74 - 3.07385P 3.89976P 78.82% 0b 0b 0.00%

- NL400s_4tb 8-9,11-15,17,19-20,29-30,34,41-42,48,70-74 +3n 2.07641P 2.66333P 77.96% 0b 0b 0.00%

- NL400s_3tb 16,21-25,49-55 +2n 1021.3753T 1.23643P 80.67% 0b 0b 0.00%

------------------------------------------------------------------------------------------------------------------------------------

Total: 7 3.20523P 4.20108P 289.42% 6.6928T 43.5456T 49.31%

dm11-29#

Any ideas?

Peter_Sero

1.2K Posts

0

July 19th, 2015 18:00

Have you checked the balance within each pool, on node and disk level?

isi status

isi statistics drive --node all --long

Do those frequent Multiscan job finish successfully, fail, or

get "system cancelled"?

Cheers

-- Peter

A

Anonymous

5 Practitioner

•

274.2K Posts

0

July 20th, 2015 03:00

You need to check in the pool level as well....as per peter_sero. This would give an insight.

Thanks,

Kiran.

carlilek

205 Posts

0

July 20th, 2015 05:00

Hi guys,

Unfortunately, running that command on a cluster with >1800 drives is probably not going to be particularly informative... but that does give me some idea where to go.

The multiscan does finish successfully. The only other clue I have is that I'm running a media scan now, which does not happen with any sort of frequency on this cluster.

carlilek

205 Posts

0

July 20th, 2015 07:00

Well, the drives match up with the disk pools for the S210s, at least:

Drive	Type	Used	Inodes
1:07	SAS	43.9	5.9K
1:08	SAS	43.9	5.9K
1:09	SAS	43.9	6.1K
1:10	SAS	43.9	6.0K
1:11	SAS	44.1	6.1K
1:12	SAS	44	5.9K
1:13	SAS	44.8	6.7K
1:14	SAS	44.8	6.5K
1:15	SAS	44.7	6.6K
1:16	SAS	44.7	6.4K
1:17	SAS	44.7	6.7K
1:18	SAS	44.8	6.5K
1:19	SAS	47.9	6.5K
1:20	SAS	47.9	6.6K
1:21	SAS	47.9	6.5K
1:22	SAS	47.9	6.5K
1:23	SAS	47.9	6.5K
1:24	SAS	47.9	6.5K
2:07	SAS	44.9	6.0K
2:08	SAS	44.8	6.0K
2:09	SAS	44.8	6.0K
2:10	SAS	44.9	6.0K
2:11	SAS	44.8	5.9K
2:12	SAS	44.9	6.0K
2:13	SAS	44.7	6.4K
2:14	SAS	44.8	6.3K
2:15	SAS	44.7	6.4K
2:16	SAS	44.8	6.4K
2:17	SAS	44.7	6.3K
2:18	SAS	44.7	6.4K
2:19	SAS	48.2	6.2K
2:20	SAS	48.2	6.4K
2:21	SAS	48.2	6.2K
2:22	SAS	48.3	6.2K
2:23	SAS	48.2	6.3K
2:24	SAS	48.2	6.3K
3:07	SAS	44.1	5.7K
3:08	SAS	44	5.8K
3:09	SAS	44	5.8K
3:10	SAS	44.1	5.8K
3:11	SAS	44	5.7K
3:12	SAS	44.1	5.8K
3:13	SAS	44.8	6.4K
3:14	SAS	44.9	6.3K
3:15	SAS	44.8	6.3K
3:16	SAS	44.8	6.3K
3:17	SAS	44.9	6.3K
3:18	SAS	44.9	6.5K
3:19	SAS	47.9	6.3K
3:20	SAS	47.9	6.3K
3:21	SAS	47.8	6.2K
3:22	SAS	47.8	6.2K
3:23	SAS	48	6.3K
3:24	SAS	47.9	6.3K
4:07	SAS	44.4	5.9K
4:08	SAS	44.4	5.7K
4:09	SAS	44.4	5.7K
4:10	SAS	44.4	5.9K
4:11	SAS	44.4	5.8K
4:12	SAS	44.4	5.9K
4:13	SAS	44.8	6.4K
4:14	SAS	44.7	6.6K
4:15	SAS	44.7	6.5K
4:16	SAS	44.8	6.6K
4:17	SAS	44.7	6.7K
4:18	SAS	44.7	6.6K
4:19	SAS	48.1	6.4K
4:20	SAS	48	6.4K
4:21	SAS	48.1	6.2K
4:22	SAS	48.1	6.4K
4:23	SAS	48.1	6.1K
4:24	SAS	48	6.1K
5:07	SAS	43.7	5.9K
5:08	SAS	43.7	5.8K
5:09	SAS	43.8	5.7K
5:10	SAS	43.7	5.8K
5:11	SAS	43.8	5.7K
5:12	SAS	43.7	6.0K
5:13	SAS	44.7	6.3K
5:14	SAS	44.8	6.2K
5:15	SAS	44.7	6.2K
5:16	SAS	44.7	6.4K
5:17	SAS	44.7	6.4K
5:18	SAS	44.7	6.3K
5:19	SAS	48.1	6.4K
5:20	SAS	48.1	6.5K
5:21	SAS	48.1	6.4K
5:22	SAS	48	6.5K
5:23	SAS	48	6.6K
5:24	SAS	48.1	6.5K
6:07	SAS	44.8	5.8K
6:08	SAS	44.8	5.9K
6:09	SAS	44.9	6.1K
6:10	SAS	44.8	6.1K
6:11	SAS	44.9	5.8K
6:12	SAS	44.9	6.0K
6:13	SAS	44.8	6.6K
6:14	SAS	44.7	6.5K
6:15	SAS	44.7	6.6K
6:16	SAS	44.7	6.7K
6:17	SAS	44.7	6.6K
6:18	SAS	44.7	6.4K
6:19	SAS	48.1	6.3K
6:20	SAS	48.1	6.3K
6:21	SAS	48.1	6.2K
6:22	SAS	48.1	6.2K
6:23	SAS	48	6.4K
6:24	SAS	48.1	6.2K
7:07	SAS	44.6	5.7K
7:08	SAS	44.6	5.8K
7:09	SAS	44.7	5.8K
7:10	SAS	44.7	5.9K
7:11	SAS	44.6	5.9K
7:12	SAS	44.5	5.7K
7:13	SAS	44.7	6.3K
7:14	SAS	44.6	6.2K
7:15	SAS	44.6	6.3K
7:16	SAS	44.6	6.3K
7:17	SAS	44.6	6.4K
7:18	SAS	44.6	6.2K
7:19	SAS	48.2	6.5K
7:20	SAS	48.2	6.2K
7:21	SAS	48.3	6.4K
7:22	SAS	48.2	6.3K
7:23	SAS	48.3	6.3K
7:24	SAS	48.2	6.4K

Why it's like that, only the Shadow knows.

Ah well, I'll take it as not something to be particularly concerned about, just one more of the weirdnesses of running a 6 year old 60 node cluster with none of the original nodes in it that's been upgraded from 6.x through to 7. 2

Peter_Sero

1.2K Posts

0

July 20th, 2015 07:00

Could be it. But even the HDD pools in the S210 nodes strike me, how could one pool be 4% off?

Really look at the individual drives, too.

S210s:152 152 D 1-7:bay7-12 - 15T / 34T (44% )

S210s:153 153 D 1-7:bay13-18 - 15T / 34T (44% )

S210s:154 154 D 1-7:bay19-24 - 16T / 34T (48% )

carlilek

205 Posts

0

July 20th, 2015 07:00

;-)

I used a command that I'm theoretically not supposed to know about to see that the disk pools are almost all within balance, except one of the S210 disk pools is >5% more full than some of the S200 disk pools within the same tier. Would that account for it?

Name Id Type Members VHS Used / Size

------------------------------------------------------------------------------------

perftier 26 T 3,36,151 - 142T / 352T (40% )

S200-bigssd 36 G 35,37-39 1 40T / 97T (41% )

S200-bigssd:35 35 D 26-28,39-40,45,67-69 - 1.7T / 9.7T (17% )

:bay1-6

S200-bigssd:37 37 D 26-28,39-40,45,67-69 - 13T / 29T (43% )

:bay7-12

S200-bigssd:38 38 D 26-28,39-40,45,67-69 - 13T / 29T (43% )

:bay13-18

S200-bigssd:39 39 D 26-28,39-40,45,67-69 - 13T / 29T (43% )

:bay19-24

S200s 3 G 18-22 1 52T / 122T (43% )

S200s:18 18 D 10,18,31-32,35-37,64 - 644G / 3.6T (18% )

-66:bay1-2

S200s:19 19 D 10,18,31-32,35-37,64 - 12T / 27T (44% )

-66:bay3-7

S200s:20 20 D 10,18,31-32,35-37,64 - 12T / 27T (44% )

-66:bay8-12

S200s:21 21 D 10,18,31-32,35-37,64 - 14T / 32T (43% )

-66:bay13-18

S200s:22 22 D 10,18,31-32,35-37,64 - 14T / 32T (43% )

-66:bay19-24

S210s 151 G 150,152-154 1 51T / 132T (38% )

S210s:150 150 D 1-7:bay1-6 - 4.4T / 30T (15% )

S210s:152 152 D 1-7:bay7-12 - 15T / 34T (44% )

S210s:153 153 D 1-7:bay13-18 - 15T / 34T (44% )

S210s:154 154 D 1-7:bay19-24 - 16T / 34T (48% )

nltier 47 T 41,76 - 3.1P / 3.9P (79% )

NL400s_3tb 76 G 75,77-81 1 1.0P / 1.2P (81% )

NL400s_3tb:75 75 D 16,21-25,49-55:bay1- - 170T / 211T (81% )

6

NL400s_3tb:77 77 D 16,21-25,49-55:bay7- - 171T / 211T (81% )

12

NL400s_3tb:78 78 D 16,21-25,49-55:bay13 - 170T / 211T (81% )

-18

NL400s_3tb:79 79 D 16,21-25,49-55:bay19 - 171T / 211T (81% )

-24

NL400s_3tb:80 80 D 16,21-25,49-55:bay25 - 171T / 211T (81% )

-30

NL400s_3tb:81 81 D 16,21-25,49-55:bay31 - 171T / 211T (81% )

-36

NL400s_4tb 41 G 40,42-46 1 2.1P / 2.7P (78% )

NL400s_4tb:40 40 D 8-9,11-15,17,19-20,2 - 356T / 455T (78% )

9-30,34,41-42,48,70-

74:bay1-6

NL400s_4tb:42 42 D 8-9,11-15,17,19-20,2 - 356T / 455T (78% )

9-30,34,41-42,48,70-

74:bay7-12

NL400s_4tb:43 43 D 8-9,11-15,17,19-20,2 - 355T / 455T (78% )

9-30,34,41-42,48,70-

74:bay13-18

NL400s_4tb:44 44 D 8-9,11-15,17,19-20,2 - 356T / 455T (78% )

9-30,34,41-42,48,70-

74:bay19-24

NL400s_4tb:45 45 D 8-9,11-15,17,19-20,2 - 357T / 455T (78% )

9-30,34,41-42,48,70-

74:bay25-30

NL400s_4tb:46 46 D 8-9,11-15,17,19-20,2 - 355T / 455T (78% )

9-30,34,41-42,48,70-

74:bay31-36

------------------------------------------------------------------------------------

Peter_Sero

1.2K Posts

0

July 20th, 2015 07:00

Technically speaking, when looking at the Inode counts, we see a range from 5.7K to 6.7K which makes a relative difference of about 15% !

Peter_Sero

1.2K Posts

0

July 20th, 2015 07:00

> Unfortunately, running that command on a cluster with >1800 drives is probably not going to be particularly informative.

Scared of big data?

You can query the drives in a specific nodepool by copy-pasting

complex node ranges/lists right from the isi storagepool output:

isi statistics drive --long --node 10,18,31-32,35-37,64-66

It is also possible to renumber nodes to that each

nodepool is a simple contiguous range, see the "lnnset" subcommand of isi config.

The Mediascan job should run automatically on the first

weekend of each month, please have it checked by support.

Cheers

-- Peter

carlilek

205 Posts

0

July 20th, 2015 08:00

But does "out of balance" count inodes, LINs, or total size?

A

Anonymous

5 Practitioner

•

274.2K Posts

0

July 20th, 2015 08:00

I think the percentage on the pools can be +5 % or -5 % of the used capacity. That difference is acceptable. It does not count the inodes and LINs.

carlilek

205 Posts

0

July 20th, 2015 09:00

OK, that's what I thought. If another multiscan comes up, I'll poke around with the disk pool sizes and see if I can figure out why.

mattashton1

93 Posts

0

July 20th, 2015 10:00

Hi Carlilek,

Might be worth checking /var/log/messages for unusual amounts of drive stalls; those cause group changes as well. If so, open an SR.

Cheers,
Matt

A

Anonymous

5 Practitioner

•

274.2K Posts

0

July 20th, 2015 10:00

S210s:150 150 D 1-7:bay1-6 - 4.4T / 30T (15% )

S200s:18 18 D 10,18,31-32,35-37,64 - 644G / 3.6T (18% )

S200-bigssd:35 35 D 26-28,39-40,45,67-69 - 1.7T / 9.7T (17% )

these three nodes in the nodes in the different node pools have the different used capacity. probably this is the reason why multiscan job is running. because the difference between the used capacity in the nodepools is >5% .

1
2

View All

No Events found!