arichard1

21 Posts

3136

December 9th, 2015 11:00

Snapshot burden on Onefs 7.2.1 cluster

Hi all,

Someone at my office think that doing snapshot on a 500TB cluster with billions of files will require a lot of ressourses on the cluster. Actualy we keep daily snapshots.The snapshots expire after 7 days. it use 3TB total space for aproximatively 340 lives snapshots in the ".snapshot" directory.

In my mind, it look like a pretty small amount of snapshots and should not make a big difference on cluster ressources.

Any comment?

I know there are a limitation of 1K by directory but what will be the maximum recommended number snap?

Responses(14)

Peter_Sero

1.2K Posts

1

December 9th, 2015 20:00

Those 340 snapshots are on non-overlapping directories? And per directory there are just 7 daily snapshots? That shouldn't be a big problem for regular workflows.

But keep in mind that /deleting/ snapshots after expiration can take considerable amounts of CPU and disk IO. Monitor the reports for SnapshotDelete jobs over time (elapsed time <=> LINs deleted and blocks processed) to establish a baseline for your cluster. And as usual, SSD metadata acceleration speeds up jobs, in particular in scenarios with many small files.

hth

-- Peter

A

arichard1

21 Posts

0

December 10th, 2015 12:00

Yes Peter, the snapshot are on all différent directory tree! no overlaping!

snap 1 = /ifs/data/dir1, snap 2 = /ifs/data/dir2, .... for only 35 directory tree. As usual, some client create a lot of long subdirectore tree.

Question: We are doing snapshot at the head of the tree (ex: /ifs/data/dir35). Is there a big burden because of the snapshot on the cluster, if the customer create a large number of subdirectory with long name and a lot of files under it ???

In my mind, every change under /ifs/data/dir5 is block change and have no effect on snapshot because long directory tree are below ???

And yes we are using SSD for GNA! deleting look like fast. ( Progress:Deleted all of 132124 LINs and 34/34 snapshots; last deleted: 12837; 0 errors. Running Time: 44s)

carlilek

205 Posts

0

December 10th, 2015 16:00

Yeah, I wouldn't stress on that. I take snapshots at a very high level of the cluster, and I don't believe they affect performance in the slightest.

Go.Y

2 Intern

•

293 Posts

1

December 10th, 2015 17:00

Starting from OneFSv7.1.1, maximum number of snapshots total for cluster was expanded to 20,000 by default.

The snapshot per directory stays same to 1024.

Daily snapshot for 35 directories keeping 7 days is much more lower than maximum.

Peter_Sero

1.2K Posts

0

December 11th, 2015 06:00

I second that -- just don't expect that the forum will give you a free ticket for your customer. A couple of additional thoughts:

- SSD metadata speeds things up as mentioned, but watch the fill level of the SSD drives carefully. There are steep and high ramps *up* when millions of small files are deleted. (A deleted but snapshot-retained small file can take up to 17 times more SSD space than the original file until the snapshot finally expires and gets freed.)

- Small files tend to stay unchanged after creation until deletion; unlike database files or log files etc. That's nice for snapshots.

- Reading data from a retained snapshot file while the current file is being modified can be very slow due to overly locking. Kind of sad, but less likely with small files

after all.

Good luck!

-- Peter

A

arichard1

21 Posts

0

December 15th, 2015 12:00

Talking about SSD, we use SSD configure with GNA. We have 8 1,4TB SSD drives in our 8 S200 nodes, no SSD in our 4 NL400 nodes. We got Millions of files "/data = 8.8 M of dirs, 125 M files" for 370TB logical used.

Our SSD is only use at 10%. I find it very low and unshure if it work at all for NL400 node because our metadata scan is still very long for our backup true SMB share? the command

A

arichard1

21 Posts

0

December 15th, 2015 13:00

Oups! (continue)

The command "sysctl efs.bam.layout.ssd.gna_enable return "1" as active but ??? proof?? I am not sure to interpret correctly the output of the isi get -DD "files" (see atacht image) "metadata target n400_144_24gb:56(56)"

carlilek

205 Posts

0

December 16th, 2015 07:00

Look for meta width device list. But it doesn't appear to have anything listed on yours. What do your file pool policies look like?

A

arichard1

21 Posts

0

December 16th, 2015 13:00

This is the filepool Policy how apply to the isi get -DD previeusly

Description

Policy Name

NL400 : 100%

Description

All selected files to NL400

Select Files to Manage

Specify criteria for determining which files will be managed by this policy:

File Matching Criteria

IF
Path matches "data/Zone-10/SEABACK"
OR
Path matches "data/Zone-10/BALZAC"

Apply SmartPools Actions to Selected Files

Storage Settings

Move To Storage Pool or Tier

Storage Target: n400_144tb_24gb (node pool)
Use SSDs for metadata read acceleration (Recommended)

Move Snapshots to Storage Pool or Tier

Using default policy action of:
Snapshot Storage Target: anywhere
Use SSDs for metadata read acceleration (Recommended)

Requested Protection

I/O Optimization Settings

Write Performance

Using default policy action of:
SmartCache is enabled

Data Access Pattern

Using default policy action of:
Optimize for concurrent access

Peter_Sero

1.2K Posts

1

December 17th, 2015 08:00

sysctl efs.bam.layout.ssd.gna_active

tells you wether GNA is actually in effect.

(gna_enable is a settable control, and GNA still could be inhibited when specs are not met.)

In your isi get -DD screenshot, the two lines

* SSD Strategy: metadata

* SSD Status: complete

indicate that this file has one copy of metadata on SSD, i.e. GNA works for the file which has both data and regular metadata copies on NL nodes.

10% SSD capacity usage leaves you with lots of headroom. Our SSD are about 50-60% filled and it's NOT a comfortable headroom for larger house cleaning activities under snapshots (with that 17x increase I mentioned earlier).

-- Peter

A

arichard1

21 Posts

0

December 17th, 2015 10:00

Thanks for your reply,

Is there a way or a command to see the speed différence between a files or a group of files with metadata on SSD and an other group of files not on SSD?

Because my SMB backup scan true a share are very slow and should scan faster on SSD!

Any Idee?

carlilek

205 Posts

0

December 17th, 2015 10:00

Our SSDs are generally 20-25% full at this point, but we do have a large number of snapshots. Peter, how do you arrive at that 17x increase for snaps?

Peter_Sero

1.2K Posts

0

December 17th, 2015 11:00

That's the worst case. Metadata for a small file usually fits in a 512-Byte block, which is the smallest unit allocated on SSD. However when the metadata enlarges, the next allocation unit is a full 8KB block, so it jumps from 0.5KB to 8.5KB, that's 17-fold.

Usually that happens when deleting small files under ten or so snapshots. Imagine a file set whose metadata consumes tiny 3% of SSD capacity, this will jump to a whopping 51%...

-- Peter

carlilek

205 Posts

0

December 17th, 2015 11:00

Gotcha, thanks.

View All

No Events found!

Isilon

Snapshot burden on Onefs 7.2.1 cluster