Highlighted
mackdav
1 Nickel

Can't reliably ping 6224 from directly attached system

OK, here's my situation.

This is on the internet.  The 6224 is the router in this picture and physically resides in Kanata.

Both VLAN 1697 and 3994 are provided by an internet service provider.  These VLANs are provided through a single 1Gb ethernet wire.

The Kanata hosts are directly attached to the 6224; the other two sites are remote.

VLAN 3994 is a single IP address space, so theoretically it shouldn't matter physically where the hosts on that subnet are.

Here's the problem.

I have a monitoring system which is connected further into the internet, so probes from the monitor would come in to this diagram on the 1697 VLAN.

When I ping hosts at Albert or Bells Corners from the internet, there is 0 loss.  The connection looks perfect.

When I ping hosts at Kanata, I lose anywhere from 10 to 40% of the pings.  The loss is not predictable, but: when I do lose them, I always lose at least 3, usually 4, rarely more, pings in a bunch.

I have attached a monitor directly to the 6224 in Kanata on 3994..

When the monitor pings the 6224 routing interface, I see exactly the same loss pattern -- but NOT at the same time as the loss from the remote system.  Ping time is around 1ms.

When the monitor pings another system directly attached to the 6224, there is 0 loss.  Ping time is about 0.1ms, one-tenth of the time to ping the router.

Anyone know what is going on here?

0 Kudos
1 Reply
mackdav
1 Nickel

Re: Can't reliably ping 6224 from directly attached system

OK, after a lot of time staring at traces, I have a more specific symptom.

I did a tcpdump on vlan 3994 of the ISP uplink looking for my own address on the theory that all I should see is broadcast traffic going to the remote sites. Instead, I saw the packets that I would have expected to see on my system's interface going down the TLS on this VLAN.

So:

For some reason, the 6224 frequently thinks that my system is at the far end of the TLS.

When I inspect the switching-table when things are working, my entry looks like this:

3994     0007.E924.F714        2/g16      Dynamic

…which makes sense since it is plugged into port 16. However, when it is broken, it looks like this:

3994     0007.E924.F714        2/g22      Dynamic

Streams of misdirected packets seem to be led by a broadcast from my system. However, I see one broadcast leave my system, and two on the 3994 VLAN to the TLS. Usually it is a IGMP V2 Membership Report / Join Group 224.0.0.251, but sometimes it is the management chip on my system arping for itself (it does this every 2 seconds or so for reasons which are stupid).

This implies that there is a system in Bells Corners or Albert which is hearing my broadcast, and echoing it back for some reason. So the 6224 goes ah, this mac must really be down the TLS link, and adjusts its switching-table accordingly.

Does this description of the problem ring any bells ?

0 Kudos