Start a Conversation

Unsolved

S

6 Posts

1400

January 11th, 2022 07:00

View/report data domain usage for individual save sets/servers

Is it possible to get an idea for how much space a particular server is using on the data domain from networker?  With the high levels of deduplication i'm not sure if this is something that would be possible but we are currently running low on space and it would be good to have an understanding of where we may be able to save a bit of space.

We have some very large file servers but as a lot of them have very little change for their size I would expect they are very well deduped and each days backup won't amount to a huge amount of extra space used.

Things like backups of SQL backup files, backups of our voice recorder VM would be useful to see how much space they are using on the DD each day.

2.4K Posts

January 11th, 2022 22:00

With

mminfo -q "client=server_name,savetime>=start_date,savetime<=end_date" -S > file

you could get such report for the specified time range. Unfortunately, the -S report does not deliver a report in an easy-tor-read way. To generate such report, I have written a PowerShell script with an appropriate document which you may download from here: https://www.avus-cr.de/gener800_eng.pdf 

 

January 16th, 2022 16:00

Even though very nice and also explaining the various -S dedupe statistics related fields, I believe OP is looking for how much data a client actually occupies on a dd, so to be able to see and possibly act on it, for example with clients that don't dedupe too well or which are actually occupying much disk space on the dd, as dedupe is a complicated beast altogether.

The -S option gives some insight, but will not tell you how much data the whole client occupies, only how much data that specific backup occupies at that time.

A theoretical client that has no changed files whatsoever would after the initial backup, show for all follow up backups that no data is actually transferred. Once the initial backup has expired, all backups still existing, would show 0 for the actual space they consume. However the dd would still have plenty pointers left pointing to data blocks necessary to recover the system. Theoretically even the initial backup could send 0 bytes, if it is a copy of a system already in backup as then all blocks would already exist.

For example if one would wanna bill a customer based solely on the -S statistics, then that client would be free. But still on dd end data is referenced by at least this client.

In the past there was a DPA method devised requiring a vm to run a specific dd extraction of information of backup data/files for each client. When we wanted to be able to get to know how much data each client occupies on a dd, that was already to be changed by fully incorporating this into dpa, no longer needing a dedicated windows system if memory serves me right?

However that new dpa approach was also not fully working as it didn't seem to report all clients.

I believe it us supposed to be fixed in dpa19.5.

Anyways I believe there we to come possible improvements on that end. Asking for it for years already.

Dd pcm (physical capacity measurement) is way too limited as it reports about a whole mtree, not about individual clients. Also by dedupe properties it will treat reporting the disk space as if the mtree were the only mtree on the dd. If you'd add up all dusk space calculations for all mtrees on one dd, it would show more than actually occupied as many data blocks referenced are also expected to be used in other mtrees.

Ofcourse something similar I also expect with the newer dpa approach, as I expect it will treat clients as if they'd be the only client, so only assuming how much this one client would require if it would be the only one, so not taking into account that possibly thousands if windows clients would also reference that one DLL... all of them need it, so for each if them you'd need the space.

That is what makes the whole dedupe more difficult to grasp, but if the dpa approach is going to proof to be working, then it would give a very honest view if costs needed to store data, distributing it more evenly over all clients.

For example clients that use compression or worse even encryption, would based on retained amount of data show less data in backup compared to the exact same client that doesn't compress/encrypt.

With billing based on disk consumption, the compression would end consuming more space on a dd, hence the bill would be higher, whereas billing based on retained it might be the other way around.

Also a big result of new way of reporting/billing might be that reducing backup retention might reduce the bill way less. Normally going from 4 to 2 weeks might reduce it by a factor 2, but if the amount of diskspace on a dd might not reduce that much. Who knows, even retention might be set longer, if it turns out that it doesn't occupy that much more data.

If you have one customer on a dd the total bill might not change, but the distribution of costs fir each client might shift somewhat.

I'll have to look again at dpa reporting how the report is called again. I also can't recall if it is completely clear what us required on doa end, to collect data from dd and correlate that to nw, if it still alsi required dd pcm to be set/enable/schedule or not.

Way long overdue really to have such reporting out of the bix. Ideally nw itself would be able to report about it, similar as with the dpa report "frontend protected capacity" (the largest full backup of a client in the last 60 days), used for DPS suite licensing nowadays, that later also got the nw command equivalent nsrcapinfo, to be provided to Dell.

2.4K Posts

January 17th, 2022 16:00

This is what pure NW (mminfo -S) can deliver - information about each save set. Once you have all this information, what is stopping you to summarize the column values to get the total numbers? - with PowerShell it is very easy to do this ... well as long as the column names are correct In such case you must exchange them first.

I remember that for example NW reports column names with dashes (AFAIR 'savetime' will named as 'date-time' while other columns will have double names with a 'space' in between). These are the major stumbling blocks you have to deal with.

Years ago I wrote a script which created a monthly general mminfo save set report. Then I used this to generate various statistical overviews out of it. With other methods like sorting you could easily find out the largest save sets and answer other questions.

Honestly - rather than using other tools where one does not know how they are working, I prefer relying on the basic capabilities and only trust the statistics which I assembled (faked by myself. And it does not need to install additional tools which need to be maintained and which are good enough to cause other potential problems but will probably not help you to answer a specific question because their powerful algorithms just to not offer this capability.

Back to the basics! - In (data) war only the simple methods will survive and succeed.

 

January 18th, 2022 00:00

The thing is that - if I understood the initial question from OP correctly - one cannot even get the information from NW as it does not take the information from a DD into account (yet, as I understood their might come improvements on that end, haven't yet looked through all new features of nw19.6 or maybe future releases).

So if NW cannot (yet) report about actual storage occupation by a client and DPA can (or is supposed to) - which is made available for free by Dell depending on if you have a DPS license (which we have) to be the go-between application to collect various statistics - then we will use the tool made available by Dell to do so. Ideally also the NW server istelf should be able to show the consumption, which just like being able to use nsracapinfo makes things easier (also with being able to compare output as collected by DPA with that of nsrcapinfo, there have been issues in the past where nsrcapinfo run on earlier nw19 systems compared to doing the same on a nw18 system even when reporting about the same remote NW system, could show large differences caused by a bug)

More and more we'll go and use DPA (even though I am barely using it myself, I am more looking into DPC for the operations team) for various reports. Accounting not yet, that is still mminfo amount of retained data based, but in the future if DPA is able to collect the required information from NW and DD's involved, it would be all through DPA, combining the amount of disk space the client occupies, with the front end protected amount of data to determine costs for backup.

6 Posts

January 18th, 2022 09:00

Thanks for both responses.

@bingo.1 Thanks for the script, i've had a quick look but not had a chance to quite figure it out to get it working yet.  I'll let you know how I get on.

@barry_beckers I get what you're saying that you can't get a complete picture of DD use from Networker's information alone, I think it would probably provide an adequate insight though.

I'm requiring it really just to get an understanding of where our space is going to see if there is anywhere we could be saving some space as we were very short.  I did after a ticket with Dell manage to get a lot of space back though as I'd found there was a number of expired orphaned save sets that weren't deleting.  It would still be good to get a picture on the data though as we are still pretty high usage and not spending any more on disks with servers due to be migrated to Azure throughout the year. 

No Events found!

Top