MShadow

42 Posts

12879

October 5th, 2016 07:00

Error Code 10055 fault InvalidLogin does not have reason

I'm at my wits end here. This has been going on for months. I have worked with EMC support and VMware support. EMC and VMware support have also worked together. We generally have about 25-75 backup jobs fail every night. On the surface, it's a simple authentication issue. Here are the details.

1. I'm using Avamar 7.2.0-401 performing image backups on VMware 6.0

2. There are no vcenter servers being backed up at the time of the failures.

3. This only happens during peak loads

4. I can re-run each job successfully

5. It will fail at any point during the backup process when it is trying to attach to the first or next disk.

6. I have used an AD account and local account in VMware to connect without any difference.

7. VMware says they don't see the authentication errors that Avamar is reporting

8. This happens throughout the backup window

9. This happens across VMware clusters

10. This is not limited to a single proxy.

11. Proxies only see hosts in the same cluster. There is no cross traffic.

12. This has happened on 2 different virtual centers.

Please tell me someone else has seen this and has an answer.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLibVim: VixDiskLibVimGetFaultReason: fault InvalidLogin does not have reason.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLibVim: Error 3014 (listener error VmodlVimFaultInvalidLogin).

2016-10-05 01:02:06 avvcbimage Warning <16041>: VDDK:VixDiskLibVim: Login failure. Callback error 3014 at 2434.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLibVim: Failed to find the VM. Error 3014 at 2506.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLibVim: Logout from server.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLibVim: Clean up callback data.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLibVim: Clean up after logging out from server.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLibVim: Logout callback is done.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLibVim: Logout from server is done.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLibVim: Login callback is done.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLibVim: Clean up callback data.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLibVim: VixDiskLibVim_FreeNfcTicket: Free NFC ticket.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLibVim: Get NFC ticket completed.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLib: Error occurred when obtaining NFC ticket for: [VNX5300_R5_06] ARPJVXPSC02/ARPJVXPSC02.vmdk. Error 3014 (Insufficient permissions in the host operating system) (fault InvalidLogin, type VmodlVimFaultInvalidLogin, reason: (none given), translated to 3014) at 4515.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_OpenEx: Cannot open disk [VNX5300_R5_06] ARPJVXPSC02/ARPJVXPSC02.vmdk. Error 3014 (Insufficient permissions in the host operating system) (fault InvalidLogin, type VmodlVimFaultInvalidLogin, reason: (none given), translated to 3014) at 4669.

2016-10-05 01:02:06 avvcbimage Info <16041>: VDDK:VixDiskLib: VixDiskLib_Open: Cannot open disk [VNX5300_R5_06] ARPJVXPSC02/ARPJVXPSC02.vmdk. Error 3014 (Insufficient permissions in the host operating system) at 4707.

2016-10-05 01:02:06 avvcbimage Error <0000>: [IMG0008] Failed to connect to virtual disk [VNX5300_R5_06] ARPJVXPSC02/ARPJVXPSC02.vmdk (3014) (3014) Insufficient permissions in the host operating system
2016-10-05 01:02:06 avvcbimage Info <19644>: Connecting virtual disk [VNX5300_R5_06] ARPJVXPSC02/ARPJVXPSC02.vmdk
2016-10-05 01:02:06 avvcbimage Error <0000>: [IMG0008] VixDiskLib_Open([VNX5300_R5_06] ARPJVXPSC02/ARPJVXPSC02.vmdk) returned (3014) Insufficient permissions in the host operating system
2016-10-05 01:02:06 avvcbimage Info <9772>: Starting graceful (staged) termination, VixDiskLib_Open attempt to connect to virtual disk failed (wrap-up stage)

Responses(21)

J_H_

498 Posts

0

October 7th, 2016 14:00

Sometimes the error is not what you think it is.

What popped into my head was LOAD.

when you start your vm backups - HOW MANY are in the policy?

I had some issues at the start (don't remember the error) and found out some info

I should only have about 100 vm's in a policy.

Stager the start times of the policies

So VMbackup1 starts at 7 pm

VMbackup2 starts a 8 pm

VMbackup3 starts a 9 pm

This will put less of a load on the VCenter when backup policies start by only adding 100 jobs at a time.

Not sure if that is your issue, but worth a try to look into it.

As you said a RE-Run works fine. And you only have the issue at Peak Loads.

shanea1

6 Posts

0

October 18th, 2016 10:00

We are having the exact same issue. Have you had any resolution for this?

umichklewis

1.2K Posts

0

October 18th, 2016 11:00

Like J.H., we divided our VM backup window in smaller batches throughout the night. Most VMs backup in just a few minutes, but when we kicked off 150 VMs at once, we'd start to see snapshot timeouts and hotadd failures almost immediately. Even though vCenter isn't reporting authentication failures from Avamar, you might see excessive connections and/or connection failures to the vCenter DB - this was the key to us that we had too many sessions trying to authenticate to vCenter, all within 5 seconds of each other.

If you're using a vCenter server appliance, you'll want to pay special attention to this, as VMware tuned the 5.x appliances fairly low, in terms of memory and maximum connections. The 6.x appliances seem better off, but still seem a little under-sized, which you'll see during lots of backups, Storage vMotions, etc.

The simple solution for us was to group VMs in smaller batches, then kick them off throughout the evening. Using Data Protection Advisor, we plotted a simple graph of the job schedule, then moved VMs into groups with staggered start times until we were balanced out, again, just like J.H. describes. Since then, we've had no issue due to load.

Let us know if that helps!

Karl

shanea1

6 Posts

1

October 18th, 2016 11:00

Thank you. This does help. EMC support is failing me at this point.

MShadow

42 Posts

0

October 18th, 2016 11:00

No luck yet.

I've spread the jobs out but still end up with 15 to 20 failures per night.

I have uploaded the VMware VPXD and SSO logs along with my Avamar logs to find a correlation. I did have a conference call with VMware support and EMC. During that call, the L2 engineer said they have seen this and referred VMware support to a known issue that they supposedly will not address.

I'll be sure to update if I find anything.

dynamox

1 Rookie

•

20.4K Posts

0

October 18th, 2016 13:00

Karl,

how many concurrent VM backup sessions did you have ? I am using DD as my back-end store (100 streams configured for Avamar to use because it's DD9500 with 1.8k streams limit). So when my job starts i have 1300 VMs that get submitted at once, 100 run concurrently and then the rest sits in the queue. I am not seeing these issues.

umichklewis

1.2K Posts

0

October 18th, 2016 13:00

Our original backend was a DD860, which we replaced with a DD4200. I never saw any problems from Avamar/DD (with either the DD860 or the DD4200), it was always only an issue with vCenter. We would kick off the entire 600 VM shooting matching, but only the first 100-ish VMs would progress; the rest returned connection issues. When we dropped the list to 200 VMs, we had more VMs that would queue longer, but then we saw more snapshot timeouts. It wasn't until we dropped under a hundred VMs or so that we stopped seeing snapshot timeouts.

We first saw the snapshot errors on vCenter 5.0/Avamar 6.1 and 7.0 (when we switched to Avamar Gen4S nodes and DD4200), then saw primarily the hotadd issues on 7.0. No problems ever since, but we're grouped into smaller batches during our backup window.

dynamox

1 Rookie

•

20.4K Posts

0

October 18th, 2016 13:00

interesting, did not see these issues with vCenter 5.5/ Avamar 7.0 and up to vCenter 6/Avamar 7.2.

J_H_

498 Posts

0

October 18th, 2016 13:00

My VM's land on DD as well.

it was an issue with VCenter not the DD.

I had too many jobs trying to talk to VCenter at once. It slowed down to the point that Avamar timed out on the jobs.

shanea1

6 Posts

0

October 18th, 2016 13:00

Thanks guys for the input. We have an Avamar Gen4S /DD 2500. For weeks now it’s been flawless using a vCenter Appliance 6.0.0.20000 Build Number 3634791. Just the past few days we’ve had 70+ failures a night as mentioned in this thread. (“Nothing has changed” as far as we know) We are backing up about 500+ VM’s nightly with the main backup group having 400+ members. All the groups were kicking off at the same time. So as of today I have split the 400+ group into 3 groups of about 135 VMs each and then staggered the start time 2 hours from each other. I’ll see how things go tonight.

umichklewis

1.2K Posts

0

October 19th, 2016 05:00

Glad this was able to help you!

Be sure to reply back with any changes or experiences that might help the next person experiencing this.

Thanks!

umichklewis

1.2K Posts

0

October 19th, 2016 05:00

If you note significant numbers of sessions failing with snapshot creation errors, L2 might referr to something like a "transactional limit" or "session state limit". There's a tunable parameter to increase this, and VMware might suggest increasing it, restarting vCenter and re-running your backups. While this only reduced our failures by a handful, it was very informative for showing us we needed more CPU on our vCenter instance.

Let us know if you learn anything that helps your situation!

MShadow

42 Posts

0

October 19th, 2016 05:00

The only real change we were advised to make was increasing ephemeral ports because of concerns of port exhaustion, but this made no difference.

shanea1

6 Posts

0

October 19th, 2016 08:00

Splitting up my backup jobs and staggering the start times had no effect on the amount of failures. I’ve escalated the case within EMC and we are digging into it further.

MShadow

42 Posts

0

October 19th, 2016 12:00

Just today, they have found a correlation in the vpxd logs, the sso logs and the Avamar logs and have deemed it to be a VMware issue. VMware is now escalating it on their side.

1
2

View All

No Events found!