oldhercules

116 Posts

6708

March 18th, 2015 01:00

dd local compression (gzfast / gz)

Hello!

Is there anyone who is using gzfast or gz as local compression on active tier?

If yes, did you noticed any drawbacks / performance issues?

(Is it true that the "local compression" calculation is done on the clients when we are using ddboost?)

TIA,

Istvan

Responses(7)

ignesingram

59 Posts

0

March 18th, 2015 06:00

Hi,

I have been using gzfast for quite some time.

Local compression refers to compression (ex. zip - shrinking data) done on the Data Domain.

This process follows the deduplication process when "duplicate" data has been discarded.

DDBoost comes into play during the deduplication phase where by the comparison between blocks happens between client and Data Domain resulting in only "new" data being sent to the Data Domain.

This data is then a candidate for compression.

Regards,

Ignes

oldhercules

116 Posts

0

March 18th, 2015 07:00

What is your local compression factor? Ours is 2.1x with lz.

Did you saw high cpu load on your DD?

Did you changed the compression algorithm while you had data in your DD or do you started using it with gzfast?

Now I have 250 TiB used capacity.. I read somewhere that the compression change will be valid for the new backups and later it will convert the stored data during several cleaning cycles. I should now how long cleaning runtime should I expect for the first cleaning (we have to do one cleaning / week at least to free up some space).

ble1

2 Intern

•

14.3K Posts

0

March 19th, 2015 13:00

I'm using it, for example:

foo@bar# filesys option show

Option Value

------------------------------- --------

Local compression type gzfast

Marker-type auto

app-optimized-compression none

Report-replica-as-writable disabled

Current global compression type 9

Staging reserve disabled

------------------------------- --------

I use it on all my boxes and I never had any issues. Bare in mind that boxes I use are not "state of art" running head on flash as "recent" models are (well, they do go back almost 2 years so I expect refresh at EMC World). It is true that compression change is applicable to new backups. Since you are talking about compression change and cleanup, I suspect you are looking at what option you really have to create as much as space possible without dancing on the edge. In my view, compression change won't give you that much to justify change, but that is just my gut feeling - no scientific test here.

Your compression factor will depend on data and all characteristic related to data itself. So, comparing only compression type will get you nowhere. In my case I have what you see below and it still doesn't say anything about compression itself (not even de-dupe as it is all one big mix of several major data types there):

Pre-Comp Post-Comp Global-Comp Local-Comp Total-Comp

(GiB) (GiB) Factor Factor Factor

(Reduction %)

--------------- -------- --------- ----------- ---------- -------------

Currently Used: 423708.1 42980.5 - - 9.9x (89.9)

Written:*

Last 7 days 265949.0 8354.1 9.5x 3.4x 31.8x (96.9)

Last 24 hrs 47847.7 1368.3 10.3x 3.4x 35.0x (97.1)

--------------- -------- --------- ----------- ---------- -------------

oldhercules

116 Posts

0

March 20th, 2015 03:00

Our machine is a DD990. We are using it heavily (up to 100-120 TiB pre-comp backup every day) and we had no (backup) performance issues yet. This is also a very mixed environment. The overall dedup ration is around 10, but daily we reach about 15-30x.

We are not dancing on the edge right now, but it's not far away... I belive that lz is a 'lazy' compression method and I expect to have at least 3x local compression ratio with gzfast. That could save us a plenty of disk space (250 TiB used with 2.1x compression, so we could save up to 90 TiB.. but I would be happy with 50-60 (which is 2 expansion shelf..).

The big questions are:

- the deduped blocks are sent to the DD and the DD is doing the compression or the clients have to do this task too (why not?) (so the greater cpu usage will happen on every client)

- does it affects (heavily) the restore / replication speed?

- now we use collection replication, but we plan to switch to CCR.. do we have to set the same algorithm on both DDRs? what will happen with the collection replication if we change it on the primary machine?

- should I try to set it to gzfast and see how it works / and switch it back if it has big impact on the performance.. or what is against this?

ble1

2 Intern

•

14.3K Posts

0

March 23rd, 2015 09:00

- clients may (depends on setup) do partial de-dupe - compression AFAIK happens always on DD

- Impact surely may be there, but I do not think it is that heavy

- I do not have much experience with collection replication (I believe I used that during brief VTL testing), but with CCR I believe this is not a must. However I can see that as recommended. I assume you wish to change this on replication target. I do remember that when you replicate you will always see less usage on replication target than original landing box for various reasons so results you may get may be a bit misleading in such case.

- I think, as far as suggestions goes, you would need to get voice from experienced support too as they may have more info based on what they have seen on field

gieskensm

1 Message

0

March 24th, 2015 07:00

Global compression relates to the deduplication process. Data is reduced (not really compressed) by removing duplicate data,

Local compression is the real compression of the remaining data in the Data Domain after deduplication.

Compression algorithms have a trade off between the amount of compression that can be achieved and performance. In general stronger compression causes higher loads on the processor.

How much and if you really profit from choosing another algorithm depends on the nature of your data. If for instance your data doesn't compress very well (like already compressed Oracle data), you will hardly win any space by using a stronger algorithm at a relative high performance penalty.

In most cases the performance on data intake on the active tier (compression is done in flight) is more important than achieving maximum compression. So that's why the default for local compression type is lz (best on performance, not the best compression). On the archive tier, if you have any, the argument is usually more in favor of stronger compression (gzfast or even gz)

Unless you have a specific situation, I think there is no reason to choose something else than lz for the active tier.

oldhercules

116 Posts

0

March 24th, 2015 07:00

We are not sending compressed data to the DD (at least there is a policy to not use compression for DB backups).

I have contacted the support about replication vs. filesystem compression algorithms, the action plan depends on their answers:

We plan to deploy the Data Domain Management Center - I hope I'll have a better overview on the DD performance / whole day CPU usage.

If it works, then we will change the compression to gzfast only on the collection replication target machine to see the compression results. If the CPU usage won't significally change there and if can save considerable amount of disk space then - maybe - we will switch to gzfast.

(It seems that we will have to extend the capacity of our DD anyway - I think we will try the switch then, to have enough disk space to revert the change if we hit a performance problem).

View All

No Events found!