Andras Dosztal
Network architect
Apr 29, 2016 8 min read

Replication Over The Backup WAN, Part 2

“It’s fine to have a backup link, but normally it’s not used and we’re paying a lot for it; could you send our file storage’s replication traffic through it?” says the recurring customer question, immediately followed by the “Just make sure business traffic is preferred in case of a primary link failure.” statement. After a solution for a small branch office, let’s continue with a large branch; then we’ll move to a campus network in the next post.

Scenario 2: Large remote branch

This site has two edge routers, both having its own uplink to the WAN. The switch SW1 represents the whole switched network, normally those are redundant too.

The whole network is in OSPF area 0 except for the link between IOU1 and IOU2 (the reason behind this will be explained later). The two edge routers are acting as a redundant gateway to the hosts using HSRP. We’ll do almost the same exercise as in the previous post, except I won’t use Ostinato because QoS has already been explained; we’re focusing only on PBR in a different scenario (the downloadable configs have QoS implemented, I’m just not discussing that here). As large traffic volume doesn’t need to be generated, I’m starting traceroutes from the connected hosts to IOU5’s loopback IP (5.5.5.5). Interfaces IPs go the usual way, loopbacks are H.H.H.H, “physical” interfaces are 192.168.{H1}{H2}.{H1|H2} (e.g. 192.168.24.4 for s2/0 of IOU4, 192.168.13.1 for e0/1 of IOU1).

Implementing Policy Based Routing

The solution will almost be the same as last time except for one thing. If you remember, I wrote in the “Backup link down” section that PBR doesn’t have to check reachability of the next hop because that can be found on a directly connected network. Here PBR will be implemented on IOU1 but it has to check whether the link between IOU2 and IOU4 is down; otherwise it would send replication traffic to IOU2 unconditionally, where it would be black holed in case the backup WAN link is down. With that being said, let’s start configuring the routers (basic OSPF and HSRP are not discussed but can be found in the router config files). First, set up the ACL to identify replication traffic (192.168.1.99 is used again for the replication interface, 192.168.1.10 acts as a normal PC on the network):

access-list 100 remark Replication traffic
access-list 100 permit ip host 192.168.1.99 any

Next step is creating the IP SLA, which checks if 192.168.24.2 (Serial2/0 on IOU2) is available. Tracking of that information is set up too:

ip sla 1
  icmp-echo 192.168.24.2
exit

ip sla schedule 1 life forever start-time now

track 1 ip sla 1 reachability

Then comes the route-map, which diverts traffic matching ACL 100 to IOU2:

route-map Replication permit 10
  match ip address 100
  set ip next-hop verify-availability 192.168.12.2 1 track 1
route-map Replication permit 20

As you can see, the set ip next-hop command is different from the previous scenario. This line says “if track 1’s status is OK (i.e. SLA 1, i.e. 192.168.24.2 is reachable) then set 192.168.12.2 as next hop; if it’s down, forward the traffic normally”. Finally, we just have to apply the route-map to the LAN interface (e0/0) of IOU1:

interface Ethernet0/0
  ip policy route-map Replication

Now the packets from 192.168.1.99 are routed to IOU2 but what happens there? Well, if the crosslink between IOU1 and IOU2 was participate in OSPF area 0, traffic would be simply routed back to IOU1 because the cost through that would still be lower than the serial links through IOU4. There are two solutions for this:

Set up PBR on IOU2 too
Don’t include the mentioned crosslink in OSPF

I chose the latter one. There’s just one thing we have to make sure: if the primary WAN link goes down, the HSRP active forwarder role has to be moved to IOU2, otherwise normal traffic would be black holed, as IOU1 only knows IOU2 as next hop for the policy based routed traffic. Tackling this potential issue means using IP SLA again, now for the HSRP config on IOU1:

ip sla 2
  icmp-echo 192.168.13.3
exit

ip sla schedule 2 life forever start-time now

track 2 ip sla 2 reachability

interface e0/0
  standby 1 track 2 shutdown

Tests

Let’s go through each mode of operation again:

Both WAN links are up

PC10 (192.168.1.10) should go the primary path because that has higher bandwidth (i.e. lower OSPF cost):

PC10> ping 5.5.5.5
84 bytes from 5.5.5.5 icmp_seq=1 ttl=253 time=0.458 ms
84 bytes from 5.5.5.5 icmp_seq=2 ttl=253 time=0.420 ms
84 bytes from 5.5.5.5 icmp_seq=3 ttl=253 time=0.525 ms
84 bytes from 5.5.5.5 icmp_seq=4 ttl=253 time=0.878 ms
84 bytes from 5.5.5.5 icmp_seq=5 ttl=253 time=0.706 ms

PC10> trace 5.5.5.5
trace to 5.5.5.5, 8 hops max, press Ctrl+C to stop
 1   192.168.1.2   0.299 ms  0.245 ms  0.174 ms
 2   192.168.13.3   0.313 ms  0.393 ms  0.213 ms
 3   *192.168.35.5   0.328 ms (ICMP type:3, code:3, Destination port unreachable)

While PC99 (192.168.1.99) should be detoured to IOU2.

PC99> ping 5.5.5.5
84 bytes from 5.5.5.5 icmp_seq=1 ttl=253 time=6.050 ms
84 bytes from 5.5.5.5 icmp_seq=2 ttl=253 time=8.956 ms
84 bytes from 5.5.5.5 icmp_seq=3 ttl=253 time=8.873 ms
84 bytes from 5.5.5.5 icmp_seq=4 ttl=253 time=8.789 ms
84 bytes from 5.5.5.5 icmp_seq=5 ttl=253 time=8.874 ms

PC99> trace 5.5.5.5
trace to 5.5.5.5, 8 hops max, press Ctrl+C to stop
 1   192.168.1.2   0.286 ms  0.228 ms  0.252 ms
 2   192.168.12.2   0.305 ms  0.281 ms  0.315 ms   <<< IOU2
 3   192.168.24.4   8.704 ms  8.699 ms  5.418 ms
 4   *192.168.45.5   8.867 ms (ICMP type:3, code:3, Destination port unreachable)

So far so good.

Backup link down

Let’s shut down the link between IOU2 and IOU4:

IOU4(config)#int s2/0
IOU4(config-if)#shut
IOU4(config-if)#
*Apr 24 14:48:56.982: %OSPF-5-ADJCHG: Process 1, Nbr 2.2.2.2 on Serial2/0 from FULL to DOWN, Neighbor Down: Interface down or detached
IOU4(config-if)#
*Apr 24 14:48:58.984: %LINK-5-CHANGED: Interface Serial2/0, changed state to administratively down
*Apr 24 14:48:59.988: %LINEPROTO-5-UPDOWN: Line protocol on Interface Serial2/0, changed state to down

Until the IP SLA timer reaches the threshold, traffic from PC99 is being black holed (which would happen infinitely if we didn’t configure IP SLA). IP SLA information on IOU1:

IOU1#
*Apr 24 14:50:18.500: %TRACK-6-STATE: 1 ip sla 1 reachability Down -> Up
IOU1#sh ip sla sum
IPSLAs Latest Operation Summary
Codes: * active, ^ inactive, ~ pending

ID           Type        Destination       Stats       Return      Last
                                           (ms)        Code        Run
-----------------------------------------------------------------------
*1           icmp-echo   192.168.24.2      RTT=17      OK          4 seconds ago

Trace from PC99:

PC99> trace 5.5.5.5
trace to 5.5.5.5, 8 hops max, press Ctrl+C to stop
 1   192.168.1.2   0.376 ms  0.254 ms  0.192 ms
 2   192.168.12.2   0.286 ms  0.244 ms  0.274 ms
 3     *
^C

Now let’s wait for a while. Eventually IP SLA realizes that 192.168.24.2 is unreachable (these timers can be fine-tuned so normally you don’t have to wait too long):

IOU1#
*Apr 24 14:49:23.470: %TRACK-6-STATE: 1 ip sla 1 reachability Up -> Down
IOU1#sh ip sla sum
IPSLAs Latest Operation Summary
Codes: * active, ^ inactive, ~ pending

ID           Type        Destination       Stats       Return      Last
                                           (ms)        Code        Run
-----------------------------------------------------------------------
*1           icmp-echo   192.168.24.2      -           Timeout     25 seconds ago

From now on, replication traffic from PC99 goes via the primary path:

PC99> ping 5.5.5.5
84 bytes from 5.5.5.5 icmp_seq=1 ttl=253 time=8.879 ms
84 bytes from 5.5.5.5 icmp_seq=2 ttl=253 time=6.006 ms
84 bytes from 5.5.5.5 icmp_seq=3 ttl=253 time=8.839 ms
84 bytes from 5.5.5.5 icmp_seq=4 ttl=253 time=10.267 ms
84 bytes from 5.5.5.5 icmp_seq=5 ttl=253 time=8.994 ms

PC99> trace 5.5.5.5
trace to 5.5.5.5, 8 hops max, press Ctrl+C to stop
 1   192.168.1.2   0.504 ms  0.320 ms  0.279 ms
 2   192.168.13.3   0.515 ms  0.399 ms  0.383 ms
 3   *192.168.35.5   0.489 ms (ICMP type:3, code:3, Destination port unreachable)

Primary link down

Let’s shut down e0/1 on IOU3:

IOU3(config)#int e0/1
IOU3(config-if)#shut

IP SLA notices IOU3 is unreachable, sets IOU2 as HSRP active (again, there’s some black holing until timers go out, you should fine-tune them in real deployments):

IOU1#
*Apr 25 15:23:23.716: %OSPF-5-ADJCHG: Process 1, Nbr 3.3.3.3 on Ethernet0/1 from FULL to DOWN, Neighbor Down: Dead timer expired
IOU1#
*Apr 25 15:23:37.380: %TRACK-6-STATE: 2 ip sla 2 reachability Up -> Down
IOU1#
*Apr 25 15:23:37.380: %HSRP-5-STATECHANGE: Ethernet0/0 Grp 1 state Active -> Init

Traffic from both PC10 and PC99 is sent through the IOU2 -> IOU4 -> IOU5 path:

PC10> ping 5.5.5.5
84 bytes from 5.5.5.5 icmp_seq=1 ttl=253 time=17.139 ms
84 bytes from 5.5.5.5 icmp_seq=2 ttl=253 time=16.454 ms
84 bytes from 5.5.5.5 icmp_seq=3 ttl=253 time=17.798 ms
84 bytes from 5.5.5.5 icmp_seq=4 ttl=253 time=17.228 ms
84 bytes from 5.5.5.5 icmp_seq=5 ttl=253 time=14.378 ms

PC10> trace 5.5.5.5
trace to 5.5.5.5, 8 hops max, press Ctrl+C to stop
 1   192.168.1.3   0.446 ms  0.297 ms  0.326 ms
 2   192.168.24.4   8.690 ms  8.570 ms  8.600 ms
 3   *192.168.45.5   16.983 ms (ICMP type:3, code:3, Destination port unreachable)


PC99> ping 5.5.5.5
84 bytes from 5.5.5.5 icmp_seq=1 ttl=253 time=17.893 ms
84 bytes from 5.5.5.5 icmp_seq=2 ttl=253 time=17.458 ms
84 bytes from 5.5.5.5 icmp_seq=3 ttl=253 time=17.426 ms
84 bytes from 5.5.5.5 icmp_seq=4 ttl=253 time=17.734 ms
84 bytes from 5.5.5.5 icmp_seq=5 ttl=253 time=12.998 ms

PC99> trace 5.5.5.5
trace to 5.5.5.5, 8 hops max, press Ctrl+C to stop
 1   192.168.1.3   0.322 ms  0.163 ms  0.167 ms
 2   192.168.24.4   8.626 ms  8.615 ms  7.819 ms
 3   *192.168.45.5   17.166 ms (ICMP type:3, code:3, Destination port unreachable)

Downloadable files

Device configs

« Replication Over The Backup WAN, Part 3 Replication Over The Backup WAN, Part 1 »

Andras the Techie