I will try to describe issues I am facing for 2 weeks in a row. Basically, I am supporting 2 backup servers that were working on 8.2 and had to migrate them to 9.x. As I am not from type "latest and greatest" decided to go n-1 version which was 184.108.40.206.
1. I've successfully migrated the first server(pre-production) with all licensing. Upgrade both storage nodes we have, again successfully.
However, by design, we have a second server(virtual machine) with installed NMC on it. This is where the issues started.I cannot install NMC on the server. It is giving me the error "Unable to verify authentication Server's hostname and/or port." which is described as a known problem and whole KB is distributed. I've read all of the necessary documentation and nothing helped. Even S2 is created to EMC, but they are changing the engineers every day and they are all starting from scratch with same investigation, which drives me crazy. We are digging in one place for whole week. Things that I tried and it did not worked.*NMC cannot be installed only on THAT particular server*
1. Clearing peers.
2. Deinstall/install NW
3. Inplace of the server
4. Turning off the firewall.
5. DNS/reverse DNS check.
6. java.exe added to firewall
7. Lots of reboots.
8. Using server system account to start the msi file
9. Reinstalling the java - current 8u161
10. Read all of the logs in GST - nothing to point to any errors or connection blocks from that client.
I am open for any suggestion. Telnet on 9090 is working. Port 8005 is not used.
2. After migration all of the things are migrated with legacy settings, which I find awesome. However, things are not working properly. We have had groups for DLY(incremental) and WKL(Full). After the migration, DLY groups are running and their action is giving the log and details for the backup, but not for WKL groups. Their duration over the weekend is described as 00:00:00. At first I thought - "What a great Monday to have", but checked the media and actually backups ran. Error when I am trying to get details for the action is "No details are available yet for this Action". If anyone faced this and know how to fix it, would be great.
3. As everything is with legacy settings - basically it is gray out. When I am trying to create a new policy with new workflow and action, I cannot associate it with any pool, group or anything from previous settings, which is pointing in whole redesign of the environment - please tell me that I am wrong and there is a way, PLEASE!
4. Since the migration NW server was rebooted twice. First - after the installation of 220.127.116.11 and the second was few hours ago. After each of the reboots, NW services and GST services are starting without issues, but once I log into the NMC, every tab is empty except client tab, where all clients are showed with red exclamation mark(indicating they are not included in any group, wf and action). On the client tab groups are displayed, but group tab is empty. I am all open for ideas. In NW8 there was a way to *cheat* and switch from View. In my case - nothing helped, I had to restart all of the services and wait everything to show.
5. Repository - I was trying to use repository like in 8.2. I've placed all necessary installation files in a folder, make it as repository and when I locate media kit and press "Open media kit" from software repository nothing appears on my screen. Is this a bug or I am doing something wrong?
I am looking for help for any of my queries. I am really on a verge, as I am changing engineers like napkins and nothing is indicating that I will fix any of my issues. Any help is greatly appreciated as I am lacking sleep for past week.
Account and customer will not let me upgrade the second(production) server until I manage to solve these problems.
On 8.2 everything was working flawless and it continue working that way on production server.
It has been a while since we upgraded so I might not remember all the issues we had. Let's see how I can help.
But please, also specify the OS name/version.
I do not have a direct answer to that. Please refer to the KB article # so that we can dig into it.
As the groups/workflows are started daily, they might overlap. You should avoid that as this seems to confuse nsrjobd.
Most likely the reason for that problem is that the combination of client and workflow criteria are 'not in sync'.
Especially look at the action's parameter 'Client Overide Behavior' which is listed in the second screen of the wizard.
There is a delay for the GUI to show the complete information. In our environment it takes rather 15mins.
I expect that this is due to the size of the jobs db (you see the amount of used memory for nsrjobd increasing over
You can uset nsrwatch to verify that the system is up and running.
If you want to proceed with the administration earlier, use nsradmin. Otherwise just be patient.
The repository works fine. What you must do to make NW recognize the software is:
- unpacking/uncompressing it
- point to the directory where the 'metafile' resides
You are now even able to 'upload' the software cross platform locally (Linux on Windows and vice-versa).
So far my first comments ....
Well, Aleksandar, 18.104.22.168 was full of bugs and EMC has said that 22.214.171.124 is the go to version now. The 126.96.36.199 is in GA and can be downloaded from the support website.
About the issue about the authentication server, What is the OS on the system where you are trying to configure the NMC. Is it a dedicated NMC server or does it have any other NetWorker role installed on it ?
regarding point 2. When this happens just restart the NMC services and see if the correct status is seen, else you could also try renaming the jobsdb folder (of course with the networker services shutdown)
regarding point 3. One Protection group can be mapped to only one workflow so keep that in mind when creating/assigning new workflows to existing group. Pools however have no such restriction and can be assigned to an action.
Thank you for your prompt answers. They were somewhat helpful. It is good to hear other opinion and point of view!
1. After enabling TLS 1.0, 1.1 and 1.2 and getting NW188.8.131.52, NMC installation completed successfully. However, customer asked to make a DR of the servers, bringing it to state where server was still with 8.2 installed to provide an accurate RCA which action exactly fixed the problem.
This is the KB I was talking. Also, fix is described in Installation guide of 9.1, yet, it didntt help me. https://emcservice.force.com/CustomersPartners/kA2j0000000R5oDCAS
2. Still, restarting the services is not an option. Personally, I do not like the idea of restarting NW server services or the machine itself. Also, all actions are set properly. Will monitor weekly backups and on Monday probably will update the threat.
3. I am aware of how they are working. However, when I am setting up new group, new policy and when I try to add destination pool in newly created action only default pools are available. I cannot change pool in old actions, I cannot change anything from old setup.Pools, groups, workflows... etc. Will dig more for this. I am already planning a step-by-step migration with pools and groups, as I have to retain the data in old pools and if I create new pools and groups I might get capacity problem.
4. System is running fine, NMC is just empty. I was patient enough, wait for like 30 minutes and nothing happened. As I am gonna patch to 184.108.40.206 today, NW server will be eventually restarted and will check what will happen. Renaming jobsdb is an option tho, thanks. Might be a good idea! 😃
5. I've unpacked all of the files. Will see after patching to 220.127.116.11. Hope it works, as it will be difficult to install the clients, as I do not have access to most of the servers.
Again, thanks for your thoughts so far, appreciate every single bit of effort! I will patch to 18.104.22.168 as I am not feeling current stable. It was suggested from EMC to install that version, so they did not mentioned it is full of bugs or they are recommending 22.214.171.124.
In general, I am pretty satisfied with NW 126.96.36.199, even on 'old' Windows 2008R2.
Yes there were a bunch of odd effects we discovered like ...
- orphaned log files/dirs (exclusively for the Windows version)
- different save workflow/save jobs started for the same client will not run in parallel
- a few workflows which run properly but will finally remain 'busy'
but in general it is running pretty stable. At least in our environment.
Of course I do not like the idea of restarting NW or the server more than necessary but sometimes it is the only solution.
For instance: updating/upgrading ... as this is always nothing else but
- shutting down all services/daemons
- uninstall the existing software
- installing the new version in the same location
- restarting the application
And this, in general, applies to any other application as well.
After years of NW experience I can assure you that restarting NW is not problematic if the databases are fine.
Internal mechanisms are helping here. So do not be afraid.
With respect to the jobs db - I personally do no think that the database itself creates problems. Do not forget that it is the same that NW uses for the media index as well. I think that the process itself is more likely the potential troublemaker.
With respect to the modification of the old resources ...
Aactually I cannot remember any more how we did the upgrade but I tested that thoroughly before we upgraded the live system. And I did not remember any showstopper. Especially I do not understand why you should not be able to modify old resoures.
Before you start migrating/re-configuring the old resources you better look whether this is GUI or a NW core problem.
- use nsradmin (visual mode) and try to modify an 'old' resource
NW might tell you that it is read-only. However, I do not know why this should be the case.
- Try to change a pool from the visual mapping of an action (use the right-click menu)
Especially I am surprised about the fact that you can only select the default pools for a new action. It somehow looks as if NW would not 'forward' new resources properly to the GUI. Would this be possible with nsradmin, though?
I do not see a capacity problem in the fact that you just create new pools.
I have run the client push upgrade with NW 188.8.131.52 without a general problem.
But do not forget that there are situations where you are not allowed to install any software remotely. This is not a NW issue.
Because we only have a few Linux servers I always update those locally. Also due to missing rights, as mentioned above.
The only issue I really remember was that NW tried to query the running daemons on the client to early. The result was that he reported that the 'NW daemons are not running' but in fact the upgrade has been successful. You better verify that with
nsradmin -p nsrexec -s <client_name>
Search the NW version.
Personally, I would upgrade to NW 9.2.1 but we currently can't because our DD has to be upgraded first.
To update the thread a bit and to share some more things I am finding.
1. NMC installation issue was related to security protocols on OS level, now everything is working fine - resolved.
2. After latest patching, server was rebooted and tabs were empty again. Patiently waited for like an hour and then had to stop and start the services. - not resolved, a bit annoying tho.
3. After installing 184.108.40.206 pools and everything that was gray out is now available. Working perfectly! - resolved.
4. Weekly/Monthly workflows are displaying everything as intend now. - resolved.
5. Repository is configured and working fine on same platform(Windows), but cross platform is giving me errors. According to support I need a NFS share between Server and cross platform client we are using as a repository. - I gave up, tho. My request to get a NFS share was declined instantly.
What I find new - notification priority for failed workflow is "notice" which is making report over email for failed backups impossible. Only possible way to get an email notification is via Policy > Workflow > Notification > On failure which I am already using for other more important purpose. In 8.2 these notifications were marked as "Alert" which is reasonable.
Other thing is nsradmin. I had working scripts reporting and uploading a certain information for each group such as name, autostart, start time, browse policy, schedule. Below is the command that script was using. In NW9.x you cannot have such information for Workflows. These attributes are no longer available and information you are getting for workflows from "NSR Protection policy" is useless and the format is questionable.
show name; autostart; start time; browse policy; schedule
print type: NSR group
@Bingo, I have to wait the retention of all tapes in current pools to expire and before that happens I might hit the capacity of the DD. Once tapes from old pools expired, everything will be fine, but until that I will definitely struggle with capacity
In that particular environment I am supporting we have like 400 linux servers, around 100 unix and 50 windows servers. We are running NW9 on Windows 20012R2 which makes quite a struggle with upgrading the linux servers and NMDA. I am looking for a way to setup the nsrpush as I don't have access to any of the Linux/Unix servers(I cannot really do my job with such restrictions) and that is the quickest and efficient way to upgrade clients and NMDA(especially) without uninstalling, installing and relink NMDA library to oracle. For now I am stuck with ideas and how to setup cross platform upgrade.
It is surprising that you still cannot load a NW/Linux software into your Windows repository.
IMHO this is now nothing but a very easy step. Here is the procedure:
- Use 7zip (or similar) to unzip the packages to a Windows directory
- Once you add the software to your repository select the directory where the METAFILE resides
And off you go.
I do not understand your statement:
"@Bingo, I have to wait the retention of all tapes in current pools to expire and before that happens I might hit the capacity of the DD. Once tapes from old pools expired, everything will be fine, but until that I will definitely struggle with capacity "
Of course I do not know your exact settings but save set expiration is totally independent:
- We backup to a DD with a retention policy of 1 month.
- We use scripted cloning to clone the fulls of the 1st week/every month to tape.
Within the clone statement we set the retention period to 18 months.
The result: the retention of the save sets on their media works totally indedependent.
You may want to implement that in your environment as well - it gives you so much more flexibility.
Until such will become effective, you may still erase the save sets from disk.
BUT DO NOT FORGET THE SSID/CLONEID COMBINATION - otherwise NW will delete the save set wherever NW knows an instance.
So instead of
"nsrmm -y -d -S ssid"
you must use
"nsrmm -y -d -S ssid/cloneid"
where cloneid points to your disk volume.
Don't mind me, I completely messed it up. If I go on details, will make it even worse.
We have a script that we use on daily basis for clearing SS that already expire.
Yes, you are correct with the extracting the rpms and folders. I had a session with EMC 2 weeks ago and they suggested to make nfs and such.. As of today morning, I did what you suggested above and it is working. There was an option we were missing. nsr/res/servers file was not created and that is what caused the mess. I didn't know about it until today, when I request a root access to one of the servers. Once I added all of our NetWorker servers inside of the servers file(we oftenly are performing oracle refreshes which are cross DC), nsrpush completed successfully.
Thanks for the advises so far.
Does anyone know what version we need to be on to turn off TLS 1.0 on our Networker Server? The new PCI requirements (coming in a few months) require TLS 1.0 to be disabled on all our internal systems. We are currently running 220.127.116.11 and I have not found any information on the support site on how to disable it.
You have to disable it from platform side. You can check how to in Microsoft articles, they have it.
Have in mind that if you have to reinstall NMC, you have to enable it, then install the NMC and you can disable it after you setup the connectivity and accounts.