writeups/ipv6/rfc4191/rfc4191.md


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134

# Towards Zero Downtime: RFC 4191
[RFC 4191](https://datatracker.ietf.org/doc/html/rfc4191) defines the router
information option(RIO). It's effectively a way to push routes to the nodes in
the LAN similar to [RFC
3442](https://serverfault.com/questions/640565/how-can-i-configure-my-dhcp-server-to-distribute-ip-routes).
Unlike monolithic and authoritative DHCPv4, RA is done by the actual routers
that are responsible for routing traffic. This gives us many options to explore:

1. Load balancing: analogous to ECMP and MED in BGP
1. Multiple prefix exit routes: transparent multiple VPN gateways, private links
1. Fault tolerance: use of lifetime attributes to eliminate single point of
   failure in the network

## OS Support
An operating system that supports RFC 4191 should accept the RIOs in RA messages
and add the prefixes in the routing table.

| OS | Support | Since | Note |
| - | - | - | - |
| Windows | YES | ? | First mention in [Windows Server 2012 doc](https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/jj574227(v=ws.11)) |
| Linux | MAYBE | [v2.6.17-rc1](https://github.com/torvalds/linux/blame/4236f913808cebef1b9e078726a4e5d56064f7ad/net/ipv6/ndisc.c#L258) | `CONFIG_IPV6_ROUTE_INFO` disabled by default, but most distros enable it |
| Android | YES | [4.2](https://en.wikipedia.org/wiki/Comparison_of_IPv6_support_in_operating_systems) ? | Linux support predates Android, so it could have been supported since 4.2 |
| XNU(IOS, macos) | YES | [xnu-7195.50.7.100.1](https://github.com/apple-oss-distributions/xnu/blame/8d741a5de7ff4191bf97d57b9f54c2f6d4a15585/bsd/netinet6/nd6_rtr.c#L490) | https://theapplewiki.com/wiki/Kernel#Versions |
| FreeBSD | [NO](https://github.com/freebsd/freebsd-src/blob/47ca5d103f229b090899379ce449af5e89faf627/sys/netinet6/nd6.c#L507) | - | Router discovery implemented in userspace "rtsold" |
| OpenBSD | [NO](https://github.com/openbsd/src/blob/36a0e83f909d48cbb69156be916b6356c14b9ae5/sbin/slaacd/engine.c#L1555) | - | Router discovery implemented in userspace "slaacd" |

## RFC 4191 in Action
<img src="../radvd/drawing-a.svg" style="background: grey;">

Imagine a set up where there are a private L2 link between the office building
one and two. Obviously, default routers should run on each building for internet
connection. If the number of nodes in both buildings are less than 2048(safe
limit for ethernet switches), the routers for the private link wouldn't be
necessary because the buildings can be put in the same L2 segment. What if there
are more than 2048 nodes?

In that case, the network will need to be segmented. In order to segment the
networks, additional routers need to be introduced. Yes, this can lead to more
work, more things to maintain. But it sure will be worth it.

The `radvd.conf` on the private link router will look something like this.

```conf
# iface to building #1 segment
interface eth0 {
	AdvSendAdvert On;
	# this tells the nodes that this router is NOT a default router
	AdvDefaultLifetime 0;
	MinRtrAdvInterval 30;
	MaxRtrAdvInterval 120;

	route 2001:db8:0ff1ce:2::/54 {
		# if no further RA message is received within 1 minute,
		# the nodes will expire this prefix
		AdvRouteLifetime 60;
	};
};

# iface to the private link
interface eth1 {
	AdvSendAdvert On;
	# this tells the other router that this router is NOT a default router
	AdvDefaultLifetime 0;
	MinRtrAdvInterval 30;
	MaxRtrAdvInterval 120;

	route 2001:db8:0ff1ce:1::/54 {
		# if no further RA message is received within 1 minute by the other
		# router, the other router will start redirecting traffic to the default
		# router
		AdvRouteLifetime 60;
	};
};


The RA message will look someting like the first image. When both default and
private link router are sending their RA messages, the routing table on the
nodes will look similar to the second image.

![The RA message will look something like this](image-2.png)
![screenshot of route table on the nodes](image.png)

### Failure of private link between the buildings
![Contractors bored through communication cable](severed_link.webp)

If there's no mac bridges(repeaters and such) and the link down(NO CARRIER)
condition is detected directly by the routers, the routers will immediately
start [redirecting](https://datatracker.ietf.org/doc/html/rfc4861#section-4.5)
traffic to the default routers.

In any other case, in which routers are not able to exchange RA messages or do
neighbor discovery, the prefix in the table of respective routers will expire in
1 minute after the incident. Then the routers will start redirecting the
traffic.

The network users(and even the network admin themselves) won't be able to notice
anything out of ordinary. However, ICMPv6 redirect is not an efficient process.
As the internal traffic starts following out to the internet and back, the
failover state will put strain on the default routers and the internet backbone
routers. The applications must not make any assumptions about the network and
treat traffic within the organisation any different. Information leak can still
happen so it could be a good measure to have the VPN for the internal traffic on
the default routers as well.

### Failure of the private link routers
The nodes in the building won't be able to reach the nodes in the other building
for maximum of 1 minute. After the route expires, the nodes will start using the
default router to reach the other nodes.

The nodes in the other building will experience the same downtime. However, when
the prefix expires in the router expires, it will start doing ICMPv6 redirect,
which is not efficient. To avoid this, until the problem is resolved, the L2
link can be bypassed to put the private link on the building segment or the
other router can be taken down so that the internal traffic is routed to the
internet.

### Internet Service Disruption
Well, there will be no internet :(. But people will be able to use resources on
the other building!

### Multiple of Everything!
There can be multiple routers facing the private link. They can all send RA
messages independent of each other. This also applies to the default routers as
well. The routing table on the nodes will look similar to this:

![multiple default routes](image-3.png)

Note that a node will choose one of multiple routers for the destination. Which
one it chooses is basically random, so some level of load balancing can be
achieved.

This set up can be scaled up to many buildings and routes. It'll eventually get
to a point where IBGP is better suited for the purpose, and also, the services
would have to be self-hosted on premise. That's what I call a "long shot".