Andras Dosztal
Andras Dosztal
Network architect
Dec 6, 2021 4 min read

Seamless Gateway Migration

thumbnail for this post

Migrations in data centers can be challenging, where customers, whether they are internal or external, demand zero downtime and as few disruptions as possible. On the contrary, data centers are always changing environments, where migrations are not just about simple hardware replacements.

Migrating devices with routed links has its own challenges but at least you can steer traffic using your routing protocol. However when it comes to migrating the default gateway (i.e. a router or a firewall) of your hosting VLAN’s, the options are limited – especially when it comes to changing First Hop Redundancy Protocol in scenarios like:

  • Replacing Cisco devices, running HSRP, with another vendor that speaks VRRP or does anycast routing.
  • Reducing the number of HSRP/VRRP groups due to hardware limitations (hello, merchant silicons).
  • Moving from HSRP/VRRP to an anycast GW solution like Arista’s VARP.
  • And so on…

The issue

The problem with such scenarios is that the default gateway’s MAC address changes. Although almost all network devices send gARP packets when their interfaces are brought up, not all host OS’es process these. For example, Windows discards these by default and waits until entries in it ARP table time out. Simply shutting down the gateway for these hosts would mean at least 5-15 seconds of downtime, which can’t be tolerated for certain workloads. Of course we could apply workarounds on the hosts (e.g. static ARP entries or changing the default GW) but those are not scalable solutions and can be an issue even in automated environments.

So we have a situation where the following requirements must be met:

  • Both the old and the gateways must be up to forward traffic for hosts that are already switched over to the new, as well as for those that are still using the old one. Normally this causes a MAC conflict.
  • These gateways shouldn’t try to take each other’s role over.
  • Traffic from/to the hosts cannot be disrupted.

Here’s a simplified example, where Host 1 and 2 process gARP packets while Host 3 ignores them:

Topology

The solution

All the requirements can be met if we block ARP traffic between the old router and the switch using an Layer 2 ACL. The migration steps are the following.

1) Apply the ACL

Depending on what the hardware supports1, this can be either a Port ACL (PACL) or a VLAN ACL (VACL).

Blocking ARP

Below are two examples.

PACL on an Arista switch:

mac access-list Block_ARP
  deny any any arp
  permit any any

interface Etx
  description Old router
  mac access-group Block_ARP

Note: If your software version doesn’t support deny any any arp, use the protocol number: deny any any 0x806.

The 2nd example is VACL on a Cisco switch. Since this is applied to the whole VLAN, we have to make sure we’re blocking only ARP packets from/to the old router. Also, we have to block both the router’s MAC address and the HSRP virtual address because there are periodic gARP replies sent be the routers with the HSRP virtual MAC as the source.

mac access-list extended ACL-Block_ARP
  permit <old_router_mac> 0000.0000.0000 any 0x806 0x0
  permit any <old_router_mac> 0000.0000.0000 0x806 0x0
  permit 0000.0c07.ac00 0000.0000.00ff 0x806 0x0

vlan access-map AM-Block_ARP 10
  action drop
  match mac address ARP_Packet
vlan access-map AM-Block_ARP 20
  action forward

vlan filter AM-Block_ARP vlan-list <your VLANs>

2) Bring up the new router

Now it’s time to add the new default gateway with the different FHRP solution. The following example deploy anycast routing on an Arista switch. Note: This config snippet shows the interface config only but there are some prerequisites you have to add as well (e.g. virtual-router MAC, MLAG in a redundant setup, etc).

interface vlanx
  ip virtual-router address x.x.x.x/x
  no shutdown

At this point the new router announces itself with gARP, which is processed by Host 1 & 2. They immediately shift their traffic to the new router. However Host 3 ignores it and uses the old router as the default gateway:

Transition between the old and the new

3) Wait until Host 3 makes an ARP request

Eventually the default gateway’s ARP entry on Host 3 will time out, and asks for its MAC address. Since the old router is blocked, it will receive the new router’s response only:

Host 3 making an ARP request

4) Cleanup

Now all hosts use the new router as the default gateway. You can shut down the old router and remove the L2 ACL from the switch.

Final setup

Discussion

Questions, comments? Raise them on my Facebook page, on Twitter, or on LinkedIn.


  1. If the switch support Layer 2 ACL’s at all; this feature is not implemented in every ASIC. ↩︎