Unsolved
7 Posts
0
787
Force10 logging many MXL-10/40GbE:0 %MACAGT-5-HASH_COLLISION_LOG entries
Hi (reposting this without the doc link which flagged it as spam?),
While investigating why our vmware VMs on ESX i 6.5U3 will drop off the network after vMotion (continuous ping shows 100-200 pings lost before it again becomes reachable again) - in 4/16 vMotion cases the VM did not reply again until restarted, I checked the Force10 logs and saw these entries logged every 13 seconds or so:
Sep 30 16:31:21 %MXL-10/40GbE:0 %MACAGT-5-HASH_COLLISION_LOG: Mac:00:50:56:83:50:7c/Vlan:200 could not be added to L2 CAM on portpipe 0 stack-unit 0
due to hash collision. Total number of hash collisions: 105206626
Sep 30 16:31:08 %MXL-10/40GbE:0 %MACAGT-5-HASH_COLLISION_LOG: Mac:00:50:56:ab:99:30/Vlan:200 could not be added to L2 CAM on portpipe 0 stack-unit 0
due to hash collision. Total number of hash collisions: 105206322
Sep 30 16:30:57 %MXL-10/40GbE:0 %MACAGT-5-HASH_COLLISION_LOG: Mac:00:50:56:ab:99:30/Vlan:200 could not be added to L2 CAM on portpipe 0 stack-unit 0
due to hash collision. Total number of hash collisions: 105205933
Sep 30 16:30:44 %MXL-10/40GbE:0 %MACAGT-5-HASH_COLLISION_LOG: Mac:00:50:56:ab:99:30/Vlan:200 could not be added to L2 CAM on portpipe 0 stack-unit 0
due to hash collision. Total number of hash collisions: 105205537
I found a Dell doc with the following:
Upgrade to software allowing for DUAL HASHING. Specific platforms post release 9.3 have the ability to perform dual hashing. Dual hashing support for both L2 and L3 tables is available. This feature is enabled by default on all those platforms running 9.3. Switch tries to re-hash and re-order the tables to accommodate new entries whenever a hash collision happens.
(Our current version is 9.13) - how to verify dual hasking is enabled ?
Reduce ARP timeout. Default is 4 hours. By reducing the length of time ARP’s are retained it allows for more frequent introduction of new ARP entries. This will of course also force all entries to cycle through faster and will increase ARP traffic for the attached networks.
Tried making the timeout 5 minutes - got error setting arp timeout for 0/2:
M1000e-A2(conf-if-te-0/2)#arp timeout ?
<0-35790> Minutes (default = 4 hours)
M1000e-A2(conf-if-te-0/2)#arp timeout 5
% Error: Port is not in Layer-3 mode Te 0/2.
Has anyone seen this issue before ? What was the remedy ?
thanks,
Fletcher
DELL-Josh Cr
Moderator
Moderator
•
8.7K Posts
0
October 1st, 2019 17:00
Hi,
Dual hashing is enabled by default if a port is in layer 3 mode. Based on the second error it sounds like it is only in Layer 2 mode. Page 380 https://downloads.dell.com/manuals/all-products/esuprt_ser_stor_net/esuprt_networking/esuprt_net_blade_intercnts/force10-mxl-blade_concept-guide2_en-us.pdf
https://www.dell.com/support/article/us/en/04/sln295207/force10-hash-collisions-and-how-to-avoid-them?lang=en
fcocquyt
7 Posts
0
October 2nd, 2019 13:00
Hi,
While investigating why our vmware VMs on ESX i 6.5U3 will drop off the network after vMotion (continuous ping shows 100-200 pings lost before it again becomes reachable again) - in 4/16 vMotion cases the VM did not reply again until restarted, I checked the Force10 logs and saw these entries logged every 13 seconds or so:
MACAGT-5-HASH_COLLISION_LOG
Is there a remedy for this ? It seems likely to be making vMotion unreliable
thanks,
Fletcher
fcocquyt
7 Posts
0
October 4th, 2019 09:00
Hi Josh,
Thanks for the reply and please ignore the reposts (for some reason, my first posts were initially flagged as spam)
So if we are seeing these constant L2 HASH_COLLISIONS logged, what is the recommended action if the L3 remedies don't apply ?
I read page 380-381 of the doc and it says L3 is enabled once an IP is assigned to the interface...would that be recommended as a non-disruptive action plan I can bring to my networking team ?
thanks!