I believe EMC uses 1024x1000x1000xetc formula, but in short.... you have around 170TB and native LTO5 takes 1.5 which gives you around number you already found. In most cases math does not apply because almost eveyone uses compression so depending on compression ration you would need far less tapes.
Because of compression being different for certain types of data, the best thing is to calculate 1:1 ratio (data vs native tape capacity). You know for sure that you will squeeze some space out with compression (with dedupe solutions it gets more complicated and sometimes even interesting if you use dedupe on primarystorage side and your backup storage side).
So, let's say you have 170TB to backup. Will those 113 tapes be enough? Maybe, but we should assume not. Why? Because I assume this is not one off backup witt retention of forever or backup which expires after 24 hours. So, you need to take into account delta change of data you backup and include that in your retention. If storage you backup is using compression or dedupe, 1:1 might not be enough (as dedupe might surprise you).
I find that companies are not that flexible in continually purchasing tapes. I try and over estimate in the first instance that way i am not appearing to biteing at their ancles by requesting media purchases continually.
Could you also post the exact formula that you guys /EMC use to calculate tape capacity. i would be interested in know just how far off i am.
The base number that i got was from a timed period.
In my original post i constructed a report from 1.3.2011 - 1.4.2011 so 1st march to 1st april. the data size in Bytes was the figure i used originaly.
LTO5 is native 1.5tb 3tb compressed.
the outcome that i came to was 113 tapes per month X12 -2 months for by bi yearly backups. this keeps my tapes rotating into the 6 monthy pool and the end of year pool. these are offsited and sit for the 7 years traditional retention period. I use my barcode labling to identify my older tapes that i make available for these pools when the time is near.
the equasion that you demonstrated 1024x1000x100ect i would like to know more about if you wouldn't mind posting.
I estimate i will need several hundred tapes as this is only 2 groups that i have decided to initally cut over.
I also run 4X avamar grids. This is great for dedupe. and use ATO monthly to pump the data to tape. I would prefer NDMP but it is not available to me.
Whilst Avamar has greatly reduced the amount of data requiring regular tape out. The need to tape some systems will never change. (Business Requirement) not mine.
1024x1000x1000 is more model that Legato used in the past. For example, when you check with NetWorker totalsize (and unless you trim it) you get values in B. Back in old days I noticed that the way they calculate towards KB, MB, GB and later is to use x1024 to reack KB, but from that point they used x1000 for further increments.
There is whole mess with how this is calculated and it started long time ago with disk storage vendors. Obviously they use 1000*1000*1000*etc model as
XXXXXXXXXXXXXX B /1000 /1000 /1000 =X1 GB
XXXXXXXXXXXXXX B /1024 /1024 /1024 =X2 GB
In above example, X1>X2. So, when checking the disk size on shop shelves you will always buy disk showing X1 and and up with X2 value.
I believe in recent years EU has raised concern over this practice and made law which should stop this, but I didn't follow it so I have no idea what is current state. I know NetWorker in past used kind of mixed model. If I remember correctly, few years ago I did implement IBM LTO3 on our HPUX box. Nevertheless, IBM drives to enable compression even if you use OS compression device so you just have to install IBM drivers (sigh). During that period of uncompressed backups, I could see tape filling up at advertized value for native capacity. What I failed to check then, was if how much data was on that tape based on totalsize field from mdb. But, if we assume that sum of tape containing savesets would "match" native tape capacity then I guess tape vendors use same math (1000x1000x1000x1000 for TB vakues on tape and 1024x1000x1000x1000 for TB in mdb). I never really paid extra attention to this and I know some people went further by checking fragsizes (which you should if calcuting total tape and having for sure save set which spans over).
So, when talking about TB world and beyond, these differences might have impact and it all really depends on where do you get data and how you calculate it. Not sure how Avamar does this in their world.
ble1
4 Operator
•
14.4K Posts
0
April 7th, 2011 00:00
I believe EMC uses 1024x1000x1000xetc formula, but in short.... you have around 170TB and native LTO5 takes 1.5 which gives you around number you already found. In most cases math does not apply because almost eveyone uses compression so depending on compression ration you would need far less tapes.
ble1
4 Operator
•
14.4K Posts
0
April 7th, 2011 13:00
Because of compression being different for certain types of data, the best thing is to calculate 1:1 ratio (data vs native tape capacity). You know for sure that you will squeeze some space out with compression (with dedupe solutions it gets more complicated and sometimes even interesting if you use dedupe on primarystorage side and your backup storage side).
So, let's say you have 170TB to backup. Will those 113 tapes be enough? Maybe, but we should assume not. Why? Because I assume this is not one off backup witt retention of forever or backup which expires after 24 hours. So, you need to take into account delta change of data you backup and include that in your retention. If storage you backup is using compression or dedupe, 1:1 might not be enough (as dedupe might surprise you).
dugans1
2 Intern
•
186 Posts
0
April 7th, 2011 13:00
Thanks for the reply.
I find that companies are not that flexible in continually purchasing tapes. I try and over estimate in the first instance that way i am not appearing to biteing at their ancles by requesting media purchases continually.
Could you also post the exact formula that you guys /EMC use to calculate tape capacity. i would be interested in know just how far off i am.
Thanks heaps.
dugans1
2 Intern
•
186 Posts
0
April 7th, 2011 14:00
Agreed,
The base number that i got was from a timed period.
In my original post i constructed a report from 1.3.2011 - 1.4.2011 so 1st march to 1st april. the data size in Bytes was the figure i used originaly.
LTO5 is native 1.5tb 3tb compressed.
the outcome that i came to was 113 tapes per month X12 -2 months for by bi yearly backups. this keeps my tapes rotating into the 6 monthy pool and the end of year pool. these are offsited and sit for the 7 years traditional retention period. I use my barcode labling to identify my older tapes that i make available for these pools when the time is near.
the equasion that you demonstrated 1024x1000x100ect i would like to know more about if you wouldn't mind posting.
I estimate i will need several hundred tapes as this is only 2 groups that i have decided to initally cut over.
I also run 4X avamar grids. This is great for dedupe. and use ATO monthly to pump the data to tape. I would prefer NDMP but it is not available to me.
Whilst Avamar has greatly reduced the amount of data requiring regular tape out. The need to tape some systems will never change. (Business Requirement) not mine.
Thanks for all your help on this post.
ble1
4 Operator
•
14.4K Posts
0
April 8th, 2011 02:00
1024x1000x1000 is more model that Legato used in the past. For example, when you check with NetWorker totalsize (and unless you trim it) you get values in B. Back in old days I noticed that the way they calculate towards KB, MB, GB and later is to use x1024 to reack KB, but from that point they used x1000 for further increments.
There is whole mess with how this is calculated and it started long time ago with disk storage vendors. Obviously they use 1000*1000*1000*etc model as
In above example, X1>X2. So, when checking the disk size on shop shelves you will always buy disk showing X1 and and up with X2 value.
I believe in recent years EU has raised concern over this practice and made law which should stop this, but I didn't follow it so I have no idea what is current state. I know NetWorker in past used kind of mixed model. If I remember correctly, few years ago I did implement IBM LTO3 on our HPUX box. Nevertheless, IBM drives to enable compression even if you use OS compression device so you just have to install IBM drivers (sigh). During that period of uncompressed backups, I could see tape filling up at advertized value for native capacity. What I failed to check then, was if how much data was on that tape based on totalsize field from mdb. But, if we assume that sum of tape containing savesets would "match" native tape capacity then I guess tape vendors use same math (1000x1000x1000x1000 for TB vakues on tape and 1024x1000x1000x1000 for TB in mdb). I never really paid extra attention to this and I know some people went further by checking fragsizes (which you should if calcuting total tape and having for sure save set which spans over).
So, when talking about TB world and beyond, these differences might have impact and it all really depends on where do you get data and how you calculate it. Not sure how Avamar does this in their world.