A rousing discussion ensued over the weekend out on the Temple NetWorker Listserv about the future of NetWorker and yielded numerous comments on desired improvements. Bringing the conversation here as well - what are your most desired features and/or enhancements?

‎The Future of NetWorker | DELL Technologies

Responses(59)
Solutions(0)

ble1

4 Operator

•

14.4K Posts

October 11th, 2011 15:00

This is critical Networker /nsr/index data and needs to remain on local disk unless SAN storage provides better. You are right, huge index size does not make a business case for re-engineering Networker.

In my personal case SAN storage is better way to go. It allows me to get better performance, I can mirror it and I can snapshot it too. What might be more important in your case, it gives you scalability to extend.

Having 2 data zones provides the flexibility of spreading clients over the 2 zones. Each client will eithe be backed up either by zone A or zone B. The advantage is when you want to recover, you do not have to know what zone backed it up.

If you have 2 servers, then you either backup to one or to second or to both. When you restore, you must know which server (datazone) has your backups - otherwise you won't find it. Having two datazone brings other issues two - you can't share libraries/devices across different zones (at least not same devices). If somehow you plan to have your client(s) present on both this leads to resource issue which might turn out soon not to be cost effective (and way beyond the cost of additional disk storage).

"What about distributing database across server and storage nodes?"
I am looking at how I could do that.

That was hypothetical question - meaning you can't do it. Out of curiosity, how much space does you index area take, what is the biggest index and what is the retention you use?

jbentayeb

13 Posts

October 11th, 2011 15:00

In my environment, we have self-restore where users restores their own files. If I have 2 backups servers, these users will have to know what serve backed up their data. or I have to provide that info in the interface they use to peforme the restore. WIth a dynamic environment, that is hard to maintain. I am aware thet 2 datazones cannot share backup devices; they do not need to; if the 2 zones can share index and media information, they can let each other know whose restore request it is, when one comes in.

jbentayeb

13 Posts

October 11th, 2011 15:00

how much space does you index area take, what is the biggest index and what is the retention you use?

600GB, biggest 60GB; retention: 8 weeks.

ble1

4 Operator

•

14.4K Posts

October 11th, 2011 16:00

I just did a small check here... I use 51GB, biggest is 4GB (NAS) and retention is 9 days. If I assume this as 1 week and multiple my numbers with 8 I'm not that far away (though difference in biggest one is almost 50%)... I would probably get similar numbers if I used retention as you do. In my case majority of backups comes from databases (so I do not have backup cycle dependencies as we run daily fulls) thus I conclude the numbers you have are somewhat of expected (though it would be interesting to hear others and their numbers as at the end of the day index size depends on two factors - retention and number of files backed up by individual client).

If I have 2 backups servers, these users will have to know what serve backed up their data. or I have to provide that info in the interface they use to peforme the restore. WIth a dynamic environment, that is hard to maintain. I am aware thet 2 datazones cannot share backup devices; they do not need to; if the 2 zones can share index and media information, they can let each other know whose restore request it is, when one comes in.

Yeah ok, but then solely purpose of having additional machine as second backup server is to share metadata database. When you compare price of such investment and price of extending your disk, it is obvious what wins. Not to mention software costs (apart from hardware ones). Now imagine if you were backup software vendor; why would you do this if virtually no one would use it and would just complicate existing code and workflow? So, I do not expect this to happen at all. I think what is more likely to happen is some sort of new database model (eventually, that will have to happen anyway). In long run, software will be moved to PBBA most likely (actually, backup server software might become module within PBBA where you will be able to choose which module/software to run) and this will be handled inside black box most likely with completely different workflow (but that's another story which I doubt we will see in next 5 years).

More I think about it, more it comes down to the fact - just buy more disks and plug them into machine (if size as per se is the only problem). But surely it would be interesting to hear others and their take on this too...

jbentayeb

13 Posts

October 11th, 2011 16:00

Thank you Mr Crvelin!

yzabary

18 Posts

October 17th, 2011 12:00

AFTD true concurrency (backup, restore clone and stage all at the same time).

AFTD chunked staging. No single 1Tb stage jobs. Reclaim space as soon as data is on tape.

Delayed AFTD staging. Stage now. Keep a copy on disk. Delete when space is needed.

AFTD target deduplication.

Threaded save which will be able to handle dense file systems.

Smart drive asssignment. I have seen situations where a clone job which required two drives and got only one was blocking other jobs from running.

bbeckers1

2 Intern

•

203 Posts

October 18th, 2011 15:00

+1 for many asked enhancements (especially the ones for being able to suspend backups/prevent backups from starting or have NW completely for yourself for testing). I however don't get the additional request regarding having multiple datazones? Better scalability and more dynamically applied rules would be preferable instead of complicating further communication wise. unless federated data zones is explained actually, I don't see the benefit. Now we split up up environments only into more datazones as the NW server was unable to handle the load it had to take. Yes, that's not only based on what the software could deliver, but also on the resources the hardware could handle.

- for better scalability having NW load balance clients dynamically over multiple storage nodes instead of manually specifying the storage node affinity. in fast growing, large environments (1000+ clients) it would be nearly impossible to maintain that manually. I can imagine using a pool of storagenodes as the storagenode to be used instead of a list that is followed from top to bottom. or being able to define

- visualize your NW environment by showing the correlation of server, client, storage node, tape libraries, dedup appliances locations to have an easy overview of the locations (by for example add a new field "location") of all involved items and the datastreams between them.

Then easily it would be visible that for instance a cluster is located on one location, instead of being split over two locations
that a system backups its data to the wrong datacenter (for instance the local datacenter instead of remote)
better correlation between for clustered environments of the physical systems and the virtual nodes running on top of it. Now NW does show direct relationship between virtual and physical nodes, yu have to use comments to do that, while backup actuallynotices it. Being aware that a virtual node is running on a certain node that the location is defined off, would result in knowing where the virtual node is running

- as we have the adagio to always backup a system to a remote datacenter, make NW have location aware backups. If for instance in a vmware vsphwere environment a VM is moved from one datacenter to another, then have the possibilty to have dynamic rules applied to which in the end result in the client data being send always to the remote datacenter (in this back to the datacenter it came from). With features like vmotion from backup perspective it becomes more and more problematic to know were a system is actually running. Replicating/duplicating the backup data to both (or even more) datacenters so that location of a client wouldn't matter is offcourse fine from operational point of view but extremely costly. Location awareness would make NW much more dynamic.

- have NW act more as a real relational database where for instance when trying to delete a tapedrive that is still assigned to multiple pools, then only one pool is stated it belongs to instead of all (hence we use scripts to actually show us all pools a drive belongs to so that we first delete them from all pools before trying deletion of a device). When deleting clients not its references in pool definitions are also cleaned up or mentioned as existing. NW is aware of some cross references which however it is not able to show (like which tapedrive belongs to which pools from tapedrive perspective)

- being able to determine how backup datastreams have actually run using mminfo. was a storagenode used, the backupserver or maybe dedicated storage node?

- why can't we still use wildcards in directory specifications for directives? And why must mount points, including nested mount points, have their own directory specification?

- having the possibility to be able to perform test activaties on tapedrives while preventing backups/restores to use the same device also. "service mode" used to be once actually what it is actually called, being able to "service" the device. now you can't load a tape into a drive in service mode anymore and then perform a command to read the label so that you can see the tapedrives works fine. for instance after maintenance of a (virtual) tape library it might be nice to be able to test it without backups using it right away also. yes one could change the available slotrange to be only one slot, but that doesn't work when using (virtual) ACSLS. real "service mode" of the library would be nice, still being able to perform testing but NW wouldn't use it.

redm1

29 Posts

November 24th, 2011 05:00

I vote for

- source and target AFTD deduplication (without Data Domain and Avamar)

- improved acces to logs. All logs should be accesible from NMC now I must log to NW server and Clients.

- improved custom backups - scripts before and after backup. Savepnpc is not safe, because is bound with savegroup name and when I move client to another group, then is my backup inconsistent.

- generally improve backup management - all should be posible manage from NMC. I will not log to clients and there configure agents.

Martin

bbeckers1

2 Intern

•

203 Posts

November 24th, 2011 08:00

improved acces to logs? that's what we had with commandline in the past with nsr_render_log -R option to remotely read out client NW log files (or any files for that matter, also helpful to check if hosts files were aset correctly). to bad it got depricated in NW7.4 or so. must have been due to security violations...

but being able to do that from nmc would still be very helpful.

ble1

4 Operator

•

14.4K Posts

November 27th, 2011 14:00

Hi Martin,

- source and target AFTD deduplication (without Data Domain and Avamar)
- improved acces to logs. All logs should be accesible from NMC now I must log to NW server and Clients.
- improved custom backups - scripts before and after backup. Savepnpc is not safe, because is bound with savegroup name and when I move client to another group, then is my backup inconsistent.
- generally improve backup management - all should be posible manage from NMC. I will not log to clients and there configure agents.

1) I do not think EMC will ever introduce SW based dedupe. Reason for that would be that any SW based one won't match efficiency of HW one and dedicated appliance works better in that case. They will spread boost code to clients (some modules do it already now) and I expect next wave will be introduction of boost code which can be used for restores (that is not necessary as what avamar does as in case of appliance there are other ways of doing it though less efficient than source de-dupe solution could offer). EMC being HW company most likely won't invest into any SW solution for something that already exists on HW level. I do realize this might be bad news for those who do not run any PBBA nowadays and would like to see backup software doing it, but I doubt you will see that happen. Given the shift on the market and fashionly hype around dedupe it is expected that you will end up with some sort of dedupe HW solution soon or later. Further, next year will see boom in storage memory solutions (with EMC most likely coming out first with lightning soon) which additionally will change the backup paradigm. I realize not everyone might be affected by these changes, but soon they will be thus from market and roadmap point of view I do not see any business case for introduction of dedupe into AFTD.

2) Indeed, -R option seems to be killed... but even when it was there I didn't use it much. This is not to say it is useles. There is a still a way to execute remotely NSR commands and it may be this also works for log management, but I didn't test it. We use centralized syslog for all clients and from there we filter it per application and system thus I never had to bang my head against it. I think no matter how you try it is going to be hard to get rid of visiting clients and check some additional logs (like database log or system log). It would be interesting to hear if EMC plans to bring back -R option.

3) I agree with this one. Groups are used to group clients and clients group backup sets. Requirement for fo pre and post processing usually is tied to backup set itself so it makes logical to have that option within individual NSR client resource rather than NSR group.

4) The only thing to configure on clients would be nsrla database (things like nsrauth, nsrports and similar). To my knowledge, all these should be configurable from server or 3rd party box (but not NMC). I believe it would be small effort for engineering to have option like "Load client side configuration database" and have it in pop-up window configurable from server or NMC. With that said, I can see some mess with local admin list so I assume before something like this happens they will most likely need to address better nsrpush mechanism (though it might be ok now as I didn't use it since 7.4.x days).

redm1

29 Posts

November 29th, 2011 02:00

1) Yes, I agree, EMC is hardware based and therefore will not make any SW dedupe. But not all need high performance deduplication solution and in this market segment has nothing to offer :-(. I will dedupe and save place, I have hardware, I have not so much Money, I need entry level solution. And it is the cause why we probably migrate to Symantec BE or CommVault.

AllanW1

334 Posts

November 29th, 2011 05:00

Did you see the press release on the DD160? It is entry level. I'm sure once you buy a server w/ a # of cpus, memory then add storage- then add their software cost- what price will you be at? Then you throw the DD performance on top of it, integrity checks, and all the other benefits you get I'm not so sure you are getting a comparable solution.

http://www.emc.com/about/news/press/2011/20111013-01.htm

MargusR1

6 Posts

November 30th, 2011 05:00

Hi,

I would like also give some feedback and request for future:

Storage nodes should be more independent – example when network interruption

Occurs and after five minutes it will come back then storage node should sync job progress and not cancel the jobs. We would like use remote nodes what will save data locally and is controlled over wide network (read unstable). Checkpoint won’t work – too often are they corrupted and backup will be restarted from beginning.

Overcoming bad network situations in general must be better – today’s solutions all keep in mind restoring, recovering process, job automatically without manual intervention.

Something like that https://usdt.livevault.com/Help-expert/CPT/connections.html

Key management system to use native LTO drives encryption with compression.

Software based compression with encryption.

Vmware VADP restore should be possible on same vcenter and the same datastore with different name – like restoring files - if file exists then restore with different name.

All previous post has good point and I will not repeat them.

Especially i like stop all jobs running – start all jobs stopped lately – disable all jobs schedules. Because if you upgrade and you server is busy 24/7 then you must do it that way.

Wbr

Margus

ble1

4 Operator

•

14.4K Posts

December 1st, 2011 11:00

redm wrote:
Yes I agree, DD160 is entry level - for Datacenter. But for customers with additional small offices is an entry level cheap NAS storage and software deduplication. Here can EMC nothing offer :-(

Actually the price of DD160 is small - and that's not from DC point of view. If you have bunch of remote offices, it all comes down to setup and answering following questions: is NAS allowed there or do they run on local disks, what connection do you have towards central storage, what is the number of small offices, is whole infra virtual, what is the data volume and probably few more. Based on this you can optimize and make rather good backup setup with EMC tooling. Of course, you may find something else too which suits you better. I prefer to gave only VMs at small remote sites with snapshots at NAS level which is replicated to DC for example.

redm1

29 Posts

December 1st, 2011 11:00

Yes I agree, DD160 is entry level - for Datacenter. But for customers with additional small offices is an entry level cheap NAS storage and software deduplication. Here can EMC nothing offer :-(

View All

No Events found!

NetWorker

Was this post helpful?