After having persistent problems with the Xen virtual NIC locking up, LordCow and I recently migrated the services running in two virtual machines onto the host itself (followed by removing Xen altogether). To avoid having to change DNS entries and other things, we decided to bind the old IP addresses to alias interfaces on the host.
When we did this, we found that we could ping the alias interface from within the local subnet, but not from elsewhere on the network.
tcpdump showed ICMP echo requests arriving from the gateway, but no response being sent. Jaco suggested layer 2 problems, and indeed, when I ran
tcpdump -p (which disables promiscuous mode capture) I could no longer see the requests. Using
tcpdump -e (print link-layer header) revealed the echoes where arriving with the destination MAC address that the Xen VM was using before we nuked it some hours prior.
It turns out the router (which is a Cisco Catalyst 3560E) to which our machine is directly connected, had a stale ARP table entry (not normal), but no MAC address table entry (which is normal, since that MAC was no longer in use), so it was flooding the echoes out every port.
Edgar (the network admin) had a look on the switch for us, and found this ARP table entry:
Internet 137.158.xxx.yyy 235 0016.3e32.fc87 ARPA Vlan309
If I’m right, that 235 is the entry’s age in minutes. Seems like an IOS bug to me.
The solution was to manually flush the ARP entries (for both IP addresses) on the switch with:
clear ip arp 137.158.xxx.yyy