This post is more than 5 years old

1 Rookie

 • 

68 Posts

5219

May 7th, 2015 10:00

job autobalance takes long time

Hi Everyone

I have a cluster with 39 nodes IQ5400s and 5nodes s200

my free space is almost from 1% to 5%

onefs 7.1.0.3

I have a job autoblance running

Running jobs:

Job                        Impact Pri Policy     Phase Run Time

-------------------------- ------ --- ---------- ----- ----------

FSAnalyze[3444]            Low    6   LOW        1/2   6:25:36

AutoBalance[3420]          Low    4   LOW        2/5   20d 20:21

if I run the report i see this

Protele-27# isi job reports view --id=3420

AutoBalance[3420] phase 1 (2015-04-16T15:11:46)

-----------------------------------------------

Elapsed time:                     7149 seconds

Errors:                              0

Drives:                            588

LINs:                         81352318

Size:                         652005240885749

Eccs:                                0

CPU usage:                         max 56% (dev 18), min 0% (dev 2), avg 10%

Virtual memory size:               max 106628K (dev 6), min 105476K (dev 2), avg 105624K

Resident memory size:              max 11312K (dev 47), min 10308K (dev 13), avg 10665K

Read:                              11049621 ops, 166869344768 bytes (159139.0M)

Write:                             22044495 ops, 180588503040 bytes (172222.6M)

if I run the same command again I see the same number of LINs

I paused the job and after I resumed the job, and I see the same number of LINs

Is my job autobalance run normal?

How many days will the job finish?

do you recommend to wait until 30 days to finish the job?

do you recommend to stop the job and after could I start a new job?

Thanks

1 Rookie

 • 

33 Posts

May 7th, 2015 15:00

Auto Balance job takes forever. Basically it depends on  number of files on Isilon. Give more time to run the job.

93 Posts

May 7th, 2015 15:00

...and up the priority if you can, especially during off hours.

Cheers,

Matt


4 Operator

 • 

1.2K Posts

May 8th, 2015 00:00

isi job list -v

gives you the current progress, unlike isi report..., which shows info only for the finished job phases.

Have you considered upgrading to 7.1.1.2 for improved overall performance?

hth

-- Peter

1 Rookie

 • 

68 Posts

May 8th, 2015 12:00

Hi Peter

Thanks for your answer

let me check the command

because I thought that the command isi job reports view --id=3420

it will give the progress of job

1 Rookie

 • 

68 Posts

May 8th, 2015 12:00

Thanks for your answer

is the free space important for job auto balance completed his work?

Thanks

1 Rookie

 • 

68 Posts

May 8th, 2015 12:00

Thanks for your answer

I can't up the priority, because we want the same performance all day

1 Rookie

 • 

33 Posts

May 9th, 2015 23:00

Since you have only 5% left, free space plays role. It may consume more time than expected. Do you have a plan to buy new nodes or delete files on the cluster?

1 Rookie

 • 

68 Posts

May 11th, 2015 10:00

Hi

No yet, this is temporal

thanks four your interest

1 Rookie

 • 

68 Posts

May 11th, 2015 14:00

Hi

I am monitoring the job I am showing from may 8 to 11 may

may 7

Protele-6# while :; do date; isi job list -v; isi job list -v; isi job list -v ; sleep 300;done

Fri May  8 18:40:38 CDT 2015

               ID: 3420

             Type: AutoBalance

            State: Running

           Impact: Low

           Policy: LOW

         Priority: 4

            Phase: 2/5

       Start Time: 2015-04-16T13:12:37

     Running Time: 22d 1h 50m

     Participants: 2, 3, 4, 5, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 44, 45, 46, 47, 48, 49, 50, 51, 55, 56, 57

         Progress: Processed 25015300 lins; 0 zombies and 0 errors

Waiting on job ID: -

      Description:

may 11

Mon May 11 12:00:15 CDT 2015

               ID: 3420

             Type: AutoBalance

            State: Running

           Impact: Low

           Policy: LOW

         Priority: 4

            Phase: 2/5

       Start Time: 2015-04-16T13:12:37

     Running Time: 24d 12h 2m

     Participants: 2, 3, 4, 5, 6, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 26, 27, 28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 43, 44, 45, 46, 47, 48, 49, 50, 51, 55, 56, 57

         Progress: Processed 25203039 lins; 0 zombies and 0 errors

Waiting on job ID: -

      Description:

4 Operator

 • 

1.2K Posts

May 14th, 2015 21:00

That's an awfully slow progress...

We don't know the overall load and health condition of your cluster.

I found that the new 7.1 job engine does a very good job in

not slowing down the front-end NAS traffic, so giving a job MEDIUM

if not HIGH impact is mostly a safe thing to do... we have NL nodes, 3TB drives...

HOWEVER you should really be concerned about the free space on your cluster,

as you have suspected already.

Referring to the Isilon performance talk at the recent EMC Word,

overall performance degradation can start when the cluster 90% full and

can become severe from 95%... !

Cheers

-- Peter

1 Rookie

 • 

68 Posts

June 22nd, 2015 16:00

Hi I send you when the job succeeded

Thanks for your answers

Recent job results:

Time            Job                        Event

--------------- -------------------------- ------------------------------

06/09 01:21:55  FSAnalyze[3484]            Succeeded (LOW)

06/08 05:07:08  AutoBalance[3420]          Succeeded (LOW)

Top