Start a Conversation

Unsolved

This post is more than 5 years old

A

3136

December 9th, 2015 11:00

Snapshot burden on Onefs 7.2.1 cluster

Hi all,

Someone at my office think that doing snapshot on a 500TB cluster with billions of files will require a lot of ressourses on the cluster. Actualy we keep daily snapshots.The snapshots expire after 7 days. it use 3TB total space for aproximatively 340 lives snapshots in the ".snapshot" directory.

In my mind, it look like a pretty small amount of snapshots and should not make a big difference on cluster ressources.

Any comment?

I know there are a limitation of 1K by directory but what will be the maximum recommended number snap?

1.2K Posts

December 9th, 2015 20:00

Those 340 snapshots are on non-overlapping directories? And per directory there are just 7 daily snapshots? That shouldn't be a big problem for regular workflows.

But keep in mind that /deleting/ snapshots after expiration can take considerable amounts of CPU and disk IO. Monitor the reports for SnapshotDelete jobs over time (elapsed time <=> LINs deleted and blocks processed) to establish a baseline for your cluster. And as usual, SSD metadata acceleration speeds up jobs, in particular in scenarios with many small files.

hth

-- Peter

21 Posts

December 10th, 2015 12:00

Yes Peter, the snapshot are on all différent directory tree! no overlaping!

          snap 1 = /ifs/data/dir1, snap 2 = /ifs/data/dir2, ....  for only 35 directory tree. As usual, some client create a lot of long subdirectore tree.

Question: We are doing snapshot at the head of the tree (ex: /ifs/data/dir35). Is there a big burden because of the snapshot on the cluster, if the customer create a large number of subdirectory with long name and a lot of files under it ???

In my mind, every change under /ifs/data/dir5 is block change and have no effect on snapshot because long directory tree are below ???

And yes we are using SSD for GNA! deleting look like fast. ( Progress:Deleted all of 132124 LINs and 34/34 snapshots; last deleted: 12837; 0 errors.   Running Time: 44s)

205 Posts

December 10th, 2015 16:00

Yeah, I wouldn't stress on that. I take snapshots at a very high level of the cluster, and I don't believe they affect performance in the slightest.

2 Intern

 • 

293 Posts

December 10th, 2015 17:00

Starting from OneFSv7.1.1, maximum number of snapshots total for cluster was expanded to 20,000 by default.

The snapshot per directory stays same to 1024.

Daily snapshot for 35 directories keeping 7 days is much more lower than maximum.

1.2K Posts

December 11th, 2015 06:00

I second that --  just don't expect that the forum will give you a free ticket for your customer. A couple of additional thoughts:

- SSD metadata speeds things up as mentioned, but watch the fill level of the SSD drives carefully. There are steep and high ramps *up* when millions of small files are deleted.  (A deleted but snapshot-retained small file can take up to 17 times more SSD space than the original file until the snapshot finally expires and gets freed.)

- Small files tend to stay unchanged after creation until deletion; unlike database files or log files etc. That's nice for snapshots.

- Reading data from a retained snapshot file while the current file is being modified can be very slow due to overly locking. Kind of sad, but less likely with small files

after all.

Good luck!

-- Peter

21 Posts

December 15th, 2015 12:00

Talking about SSD, we use SSD configure with GNA. We have 8 1,4TB SSD drives in our 8 S200 nodes, no SSD in our 4 NL400 nodes. We got Millions of files "/data = 8.8 M of dirs, 125 M files" for 370TB logical used.

Our SSD is only use at 10%. I find it very low and unshure if it work at all for NL400 node because our metadata scan is still very long for our backup true SMB share? the command

21 Posts

December 15th, 2015 13:00

Oups! (continue)

The command "sysctl efs.bam.layout.ssd.gna_enable return "1"  as active but ??? proof?? I am not sure to interpret correctly the output of the isi get -DD "files" (see atacht image)  "metadata target n400_144_24gb:56(56)"OUTPUT.jpg

205 Posts

December 16th, 2015 07:00

Look for meta width device list. But it doesn't appear to have anything listed on yours. What do your file pool policies look like?

21 Posts

December 16th, 2015 13:00

This is the filepool Policy how apply to the isi get -DD  previeusly

Description

NL400 : 100%

All selected files to NL400

Select Files to Manage

IF
    Path matches "data/Zone-10/SEABACK"
OR
    Path matches "data/Zone-10/BALZAC"

Apply SmartPools Actions to Selected Files

Storage Settings

Storage Target: n400_144tb_24gb (node pool)
Use SSDs for metadata read acceleration (Recommended)

Using default policy action of:
      Snapshot Storage Target: anywhere
    Use SSDs for metadata read acceleration (Recommended)

I/O Optimization Settings

Using default policy action of:
SmartCache is enabled

Using default policy action of:
Optimize for concurrent access

1.2K Posts

December 17th, 2015 08:00

sysctl efs.bam.layout.ssd.gna_active

tells you wether GNA is actually in effect.

(gna_enable is a settable control, and GNA still could be inhibited when specs are not met.)

In your isi get -DD screenshot, the two lines

*  SSD Strategy:       metadata

*  SSD Status:         complete

indicate that this file has one copy of metadata on SSD, i.e. GNA works for the file which has both data and regular metadata copies on NL nodes.

10% SSD capacity usage leaves you with lots of headroom. Our SSD are about 50-60% filled and it's NOT a comfortable headroom for larger house cleaning activities under snapshots (with that 17x increase I mentioned earlier).

-- Peter

21 Posts

December 17th, 2015 10:00

Thanks for your reply,

Is there a way or a command to see the speed différence between a files or a group of files with metadata on SSD and an other group of files not on SSD?

Because my SMB backup scan true a share are very slow and should scan faster on SSD!

Any Idee?

205 Posts

December 17th, 2015 10:00

Our SSDs are generally 20-25% full at this point, but we do have a large number of snapshots. Peter, how do you arrive at that 17x increase for snaps?

1.2K Posts

December 17th, 2015 11:00

That's the worst case. Metadata for a small file usually fits in a 512-Byte block, which is the smallest unit allocated on SSD. However when the metadata enlarges, the next allocation unit is a full 8KB block, so it jumps from 0.5KB to 8.5KB, that's 17-fold.

Usually that happens when deleting small files under ten or so snapshots. Imagine a file set whose metadata consumes tiny 3% of SSD capacity, this will jump to a whopping 51%...

-- Peter

205 Posts

December 17th, 2015 11:00

Gotcha, thanks.

No Events found!

Top