PaulDanis
1 Copper

What does MetaData, Treewalk and Workflow mean?

Jump to solution


I hear these terms over and over when talking about Isilon. What does these terms mean?

Tags (2)
1 Solution

Accepted Solutions
PaulDanis
1 Copper

Re: What does MetaData, Treewalk and Workflow mean?

Jump to solution

These terms have a specific meaning when trying to determine a proper configuration for Isilon.

Workflow means what is the type of application that the customer will use to store or retrieve from Isilon. Examples include typical NAS File Shares or Streaming Backups, etc.

MetaData is the the 'data about the data'. For example, the actual file contains the stuff the customer would use and the metadata is the data about this file - Who owns it, when was it backed up, who as access to it, where is it on the file system, etc.

A treewalk is defined as a client looking up the MetaData on a range of files. A simple Windows Explorer or ls -la command will envoke a treewalk. Other applications include Backup applications.

View solution in original post

0 Kudos
5 Replies
PaulDanis
1 Copper

Re: What does MetaData, Treewalk and Workflow mean?

Jump to solution

These terms have a specific meaning when trying to determine a proper configuration for Isilon.

Workflow means what is the type of application that the customer will use to store or retrieve from Isilon. Examples include typical NAS File Shares or Streaming Backups, etc.

MetaData is the the 'data about the data'. For example, the actual file contains the stuff the customer would use and the metadata is the data about this file - Who owns it, when was it backed up, who as access to it, where is it on the file system, etc.

A treewalk is defined as a client looking up the MetaData on a range of files. A simple Windows Explorer or ls -la command will envoke a treewalk. Other applications include Backup applications.

View solution in original post

0 Kudos
PhilLam
3 Argentium

Re: What does MetaData, Treewalk and Workflow mean?

Jump to solution

Metadata on SSD makes the treewalk have less latency and du,df, and ls commands will respond faster.  Without metadata on SSD, those commands may take longer than expected to finish.

0 Kudos
Peter_Sero
4 Tellurium

Re: What does MetaData, Treewalk and Workflow mean?

Jump to solution

The sad thing is that so much of the SSD space is eaten up by organizing the layout of the file contents,

which doesn't benefit the treewalks. For the tree walk, typically 512 Bytes/files are used holding permissions,

file type, timestamps etc. But the data layout info easily consumes several Kilobytes of SSD per file with

file sizes growing to the Megabytes and beyond. Would be great to separate out these two types

of SSD usage.

-- Peter

Highlighted
shaisilon
1 Copper

Re: What does MetaData, Treewalk and Workflow mean?

Jump to solution

There are actually ways to control the SSD usage of different types of meta-data structures:

Inode

Stores static and dynamic attributes

Directory Tree

Directory Inode

Inode Btree

A btree which indexes the file data blocks associated with a file. Sometimes called the metatree

Inode Extension

Stores dynamic attributes once we run of space in the Inode

Quota Accounting Blocks

Used to track quota domain usage

Snapshot Tracking Files

Used to track snapshot file changes (not the actul blocks)

System Btrees

Generic btrees used by applications to store data

Specifically, the Inode structure is the one used when doing treewalks and the Inode Btree is the one that contains the list of data blocks associated wtih a file.


These all have associated sysctls that can be set and unset to control the SSD usage but it is highly recommended to consult with Isilon professional services first.


For example, having the Inode Btree on SSD can be very useful when accessing large files in a random manner because every protocol IO first has to locate the data block of a random offset to a file before accessing it (think VMDK files).

Shai

Peter_Sero
4 Tellurium

Re: Re: What does MetaData, Treewalk and Workflow mean?

Jump to solution

Oh, simply terrific!

The VMDK example makes it absolutes clear that things should be thoroughly understood and planned.

Even more so as the settings are global on the cluster, rather than per file pool (SmartPools).

Are there any plans to build pre-defined combined settings for use with SmartPools?

(In a way similar to "access patterns" like concurrency, random, etc.)

Seems that would be really cool, in particular for GNA.

-- Peter

0 Kudos