Start a Conversation

Unsolved

This post is more than 5 years old

22539

May 14th, 2015 15:00

Ask the Expert: The What, Why and How of Isilon Job Engine

YOU MAY ALSO BE INTERESTED ON THESE ATE EVENTS...

Ask The Expert – Isilon’s New Releases: IsilonSD Edge, OneFS.NEXT and CloudPools

Ask the Expert: Are you ready to manage deep archiving workloads with Isilon’s HD400 node and OneFS 7.2.0? Find out more…

https://community.emc.com/thread/195695

Welcome to the EMC Support Community Ask the Expert conversation. On this occasion we will be covering EMC Isilon Job Engine. Among the many areas we'll be discussing our experts will answer your questions in regards what is Isilon Job Engine, Why it’s important to know, how to tweak them, and any issue with job engine.


Meet Your Experts:

profile-image-display.jspa?imageID=13438&size=350

Rashmikant Vyas

Technical Account Manager - EMC

Rash works on providig solution and support to Isilon customers via proactive account management. His customers are breaking the storage boundaries everyday by capacity and performance, and that comes with challenges. Prior to Isilon Rash worked on many different products in different roles and capacity -- Team lead – Designing new services. Solution Architect and Installation - EMC Block and File Storage (VNX, Clariion), Visualization Platform (VMware, Hypervisor), Fibre Channel and Network protocols. System Admin -- Sun Microsystem, VERITAS Products (Cluster, Volume Manager), Linux, UNIX flavors, writing shell scripts.


This discussion takes place from May 18th - June 5th. Get ready by bookmarking this page or signing up for e-mail notifications.


Share this event on Twitter or LinkedIn:

>> Join the Ask the Expert: The What, Why and How of Isilon Job Engine http://bit.ly/1EIHONl #EMCATE<<

May 18th, 2015 07:00

Good Morning!  Thank you for joining our Ask the Experts session. Our SME are now standing by to take all your questions. Let’s make of this thread an informative and productive resource so that it could help ECN members today and in the weeks to come.


At this point I would like to get this discussion kicked off. Your questions are now welcome!

4 Posts

May 18th, 2015 08:00

With the introduction of 7.x the job engine was redone to introduce a degree of parallelism.  However, when a drive fails all jobs are paused and FlexProtect runs till completed.  In large clusters with or without GNA this can take a long time to finish.  In aging clusters with +1000 drives you can see that the MTBF will basically produce this state in which a cluster of this nature does nothing but FlexProtect.   

There is a solution:  We have worked with support to disable this feature by putting the cluster in a 'degraded' mode to circumvent this short coming.  This forces flexprotect to run in parallel like any other OneFS job.  Of course this is not 'normal' and can only be done with the approval/assistance of support. 

The basic question is why was this ever the case?  Drive failures are routine and should not take a higher priority to repair than any other job.  True, the top priority should be protecting your data.  But with N+x and other safe guards it is well within reason to treat this like so many other storage vendors do as simply a background task.

As a job engine expert I'm interested in your opinion and if you see this being remedied in near future releases.  Thank you for your time.

17 Posts

May 18th, 2015 10:00

Let me start with some basic information Isilon Job Engine.

Job engine runs many different jobs to do different tasks. Some jobs are triggered by an event (e.g. drive failure), some are feature jobs (e.g. deleting of Snapshot), and some are user action job (e.g. Deleting bulk data).

Job engine leverages all nodes of the cluster by dividing tasks into smaller work items and then allocates it to different nodes of the cluster. Each node runs one to many workers to complete it's work item. This makes these jobs complete faster. Job engine executes these jobs in background, and uses some resources of each node. OneFS 7.1.1.x and above monitors resources (CPU and Disk I/O) and throttles job by reducing number of workers.

Every Isilon job in job engine has impact and priority.

Impact determines resources job would consume to complete it's work. Higher the impact level faster the job would complete (in most cases except few workflows) at the cost user performance.You could also configure impact level to define when job would run - .e.g  OFF_HOURS -- run a particular job outside business hours.

Priority determines which Isilon job get priority when multiple jobs are running. Prior to OneFS 7.1.1.x only one Isilon job was allowed to run at any time, and highest priority job would run and other would wait in queue. OneFS 7.1.1.x and above Isilon can run 3 jobs at the same time as long as they are from Exclusion set.  Priority takes effect when two or more queued jobs belong to the same exclusion set, or when, if exlusion sets are no a factor, four or more jobs are queued.

You could change both Priority and Impact level for each Isilon job as long as you know details on what could be impacted. Always get supports advise if you would like to change Impact level or priority.

Common jobs running you would see in your Isilon cluster when you run command "isi job status -v" (There are many more, just listing the most seen)

FlexProtect  or FlexProtectlin -- When drive fails this job runs automatically to reprotect data

SnapshotDelete -- When snapshot expires, SnapshotDelete will run to delete that snapshot(s).

SmartPools -- When you've policy to move data from one tier to another tier.

MultiScan - Runs 2 jobs Autobalance and Collect when device is added to the cluster

MediaScan -- To check for ECC and correct it. It is scheduled to run first Saturday of everymonth.

The full list of jobs and schedules can be viewed via CLI command "isi job types list --verbose", or via the WebUI, by navigating to Cluster Management > Job Operations > Job Types.

1 Rookie

 • 

20.4K Posts

May 18th, 2015 11:00

Rash,

can you talk about the FsAnalyze job a little bit. I was under impression that the only time it needs to complete full file system scan is for the initial scan and it will take advantage for snapshots for future scan afterwards. When i look at my FsAnalyze job timings it always takes the same amount of time (a couple of days in my case, 1PB cluster 70% full).

Thank you

9 Posts

May 18th, 2015 12:00

You may be thinking of the SyncIQ job. SyncIQ has to scan the whole file system tree for the initial sync, but on subsequent runs it relies on snapshots and is much quicker. Using SSD for metadata acceleration will radically increase performance of the tree walk if you have a lot of files (some customers have billions!)

In case you were wondering, SyncIQ does not require a SnapshotIQ license to do this.

1 Rookie

 • 

20.4K Posts

May 18th, 2015 13:00

nope, i am talking about FSAnalyze.

9 Posts

May 18th, 2015 13:00

Sorry, I am not aware of other jobs using snapshots.

17 Posts

May 18th, 2015 13:00

Hi Dynamox,

Best of my understanding on job engine FSAnalyzer doesn't use Snapshots as reference to make next FSAnalyzer job run faster. You may be referring to FSAnalyzer Snapshots to calculate map of which files are being accessed most often as per KB - https://emc--c.na5.visual.force.com/apex/KB_HowTo?id=kA0700000004JqY

I'm checking with Product Management Group to make sure there is nothing internal changed in future releases of OneFS 8.0 or 7.2.1.

If you've any article where you read that FSanalyzer uses Snapshots to keep track of what's been scanned and uses for next FSA job please share that with me.

Rash

17 Posts

May 18th, 2015 14:00

Hi Astack,

You brought a very good point. Lot of people are unaware that 3 jobs can run parallel as long as they are from different Exclusion Except FlexProtect  or FlexProtectLin job. When there is drive failure cluster is in "degraded" mode and only one job "FlexProtect or FlexProtectlin" is allowed to run unless you overwrite that parameter which you've done in your environment.

My opinion on this behavior

Ideal scenario -- Everyone follows the recommendations to keep proper protection level based on size of cluster and node types, and everyone keeps their clusters utilization level below 90% yes you could run other jobs when FlexProtect is running. There are many clusters running non-ideal scenarios -- Example -- initially cluster was stood up as 4 node with +2:1 protection, and eventually cluster grew to 18 nodes and it's still protected at +2:1 in addition cluster is running at 92% utilization. If we've parallel jobs running all jobs would take some amount of CPU and Disk I/O. If utilization of CPU or Disk I/O  goes above threshold  it will throttle all Isilon jobs workers (including FlexProtect). In example above probability of 2 drives failing in 18 node cluster is much higher than 4 node cluster. When cluster is running at 90%+ level utilization it makes it harder to find space and reprotect data and on top if other jobs are running it may throttle everything else.

For Isilon reprotecting data is the most important, and Isilon understands that not everyone is running their clusters in ideal situation. There customers pushing limits on both performance and capacity utilization, and in those situation we want to give all priority to FlexProtect job when there is drive failure and hence default is to run FlexProtect only.

It would be ideal to make it dynamic where OneFS looks at different parameters -- e.g. Utilization, CPU, DiskI/O, Protection level, and automatically adjusts the "degraded" mode option which you changed manually.

Let me know if that answers your question and something you would like to see as a feature ?

287 Posts

May 18th, 2015 22:00

Rash,

How many workers does each Jobs use per node?

Does it depends on Impact Policy and which hardware you are using?

Could we change it like SyncIQ settings?

17 Posts

May 19th, 2015 04:00

Hi go.y,

Answers to your questions in-line.

How many workers does each Jobs use per node?

It depends on Impact level of Job and number of nodes in the cluster -

    Low Impact - 1 to 2 workers / node

    Medium Impact - 4 to 6 workers / node

    High Impact - 12 or more workers / node

Number of workers for each job = Impact level for that job multiply with number of nodes in the cluster.

Does it depends on Impact Policy and which hardware you are using?

It does depend on Impact level as explained above

Hardware you're using -- It may -- OneFS 7.1.1.x and above can throttle number of workers based on CPU and DiskI/O. If you've Old hardware and all SATA drive only you could easily exhaust CPU and DISK I/O for certain job along with your end client activity, and OneFS will adjust the worker count to keep optimal performance for end user.

It also depends on the end user activity. If you've HPC workflow spinning all your disks at highest I/O, numbers of workers allocated will decrease for each job.

Could we change it like SyncIQ settings?

Yes, it can be changed. SyncIQ allows you to change it easily in WebGuI. Isilon job worker count can be change using command line. Increased number of workers could take lot of CPU resources on the cluster impacting performance to end user client, so always engage support or look into InsightIQ for historical daily/hourly performance data to tweak number of workers.

Best way to increase or decrease number of workers is by setting different Impact level for each job. Again, changing any jobs impact to "high" to speed up job engine job could severely impact performance.

117 Posts

May 19th, 2015 08:00

Rash / Dynamox,

(Credits to Bernie Case for the paragraph below)

We *used* to use snapshots for FSAnalyze scans because it seemed like a good idea to have a consistent view on the filesystem throughout the entire scan phase.  But there is a supportability problem with that approach.  If your FSAnalyze job is interrupted (by something like FlexProtect) in the middle of a run, that snapshot just hangs around and will continue to grow and grow unimpeded.  We have seen cluster-full scenarios due to this behavior.  So that's why we stopped taking the /ifs snapshot for FSAnalyze.

So, FSAnalyze does not use snapshot to speed up subsequent FSAnalyze runs.  It does a full file system scan every time.  This is ‘Phase 1’ of the FSAnalyze job but sometimes this is not the part that takes the longest since this phase is multithreaded and the work is split between the nodes in the cluster.  Phase 2 of FSAnalyze merges the results.db file from each individual node into one big results.db that InsightIQ can load.

Having said that, there are some changes coming in a future release later this year to speed up FSAnalyze, I do not have the details yet of the changes but I know there’s some effort being put in this area.

Tip: You can use 'isi_fsa_tool list -v' to see how long Phase 1 vs Phase 2 took for historical FSAnalyze runs.  Phase 1 is the delta between 'start_time' and 'merge_start'.  Phase 2 is the delta between 'merge_start' and 'end_time'.

4 Posts

May 19th, 2015 10:00

Well to add a bit more context to the FSAnalyze discussion...

1.  The purpose of the snapshot was to supply a look at the cluster 'as is' at the moment in time the snapshot is taken.  With a snapshot as a reference point you report is a reflection of the day the snapshot occurred.  Without the snapshot method (which has it's issues and should not be used as previously discussed) you are at the mercy of the duration of the job.  If you cluster is very large and does not have GNA your FSAnalyze job could take a week or longer to walk the tree and converge the results.  The file reporting is anywhere between the start of the job and the end of the job depending on when the directory/file was analyzed, so basically your data is skewed.

2.  The second phase is what is probably being alluded to by future changes.  If at any point phase 2 fails, by default it needs to start all over.  This is an obvious issue and something Isilon Tech Support can address if you ask nicely ; )

Regards,

Andrew Stack

1 Rookie

 • 

20.4K Posts

May 19th, 2015 10:00

Thanks Yan,  i am surprised that instead of implementing a clean-up procedure in an event of failure Isilon decided to forgo snapshots altogether. FSAnalyze scans are so painfully long, I have long gaps in my statistics (InsightIQ).

4 Posts

May 19th, 2015 10:00

Hi Rash,

Yes, you fundamentally answered my question.  However, I'd put cluster health above all else.  At the end of the day we're talking about erasure coding and as such any 'healing' is by any other storage standard a background task.  To halt all other operations for a failed drive and to run the flexprotect at medium is a hindrance to the overall Isilon objective (no disruption of service).  If you need to run autobalance or autobalance.lin due to a full cluster but cannot because 'oops' another drive failed well that's a big issue. 

I like you dynamic idea and most of your Engineers agree that this issue needs to be addressed.  Fundamentally for large clusters the current approach of parallel with an asterisk is a bit problematic.  Thus, the question.

Regards,

Andrew Stack

No Events found!

Top