1 files changed, 134 insertions, 0 deletions
diff --git a/writeups/ipv6/rfc4191/rfc4191.md b/writeups/ipv6/rfc4191/rfc4191.md
new file mode 100644
index 0000000..8f4d73f
--- /dev/null
+++ b/writeups/ipv6/rfc4191/rfc4191.md
@@ -0,0 +1,134 @@
+# Towards Zero Downtime: RFC 4191
+[RFC 4191](https://datatracker.ietf.org/doc/html/rfc4191) defines the router
+information option(RIO). It's effectively a way to push routes to the nodes in
+the LAN similar to [RFC
+3442](https://serverfault.com/questions/640565/how-can-i-configure-my-dhcp-server-to-distribute-ip-routes).
+Unlike monolithic and authoritative DHCPv4, RA is done by the actual routers
+that are responsible for routing traffic. This gives us many options to explore:
+
+1. Load balancing: analogous to ECMP and MED in BGP
+1. Multiple prefix exit routes: transparent multiple VPN gateways, private links
+1. Fault tolerance: use of lifetime attributes to eliminate single point of
+   failure in the network
+
+## OS Support
+An operating system that supports RFC 4191 should accept the RIOs in RA messages
+and add the prefixes in the routing table.
+
+| OS | Support | From | Note |
+| - | - | - | - |
+| Windows | YES | ? | First mention in [Windows Server 2012 doc](https://learn.microsoft.com/en-us/previous-versions/windows/it-pro/windows-server-2012-r2-and-2012/jj574227(v=ws.11)) |
+| Linux | MAYBE | [v2.6.17-rc1](https://github.com/torvalds/linux/blame/4236f913808cebef1b9e078726a4e5d56064f7ad/net/ipv6/ndisc.c#L258) | `CONFIG_IPV6_ROUTE_INFO` disabled by default, but most distros enable it |
+| Android | YES | ? | Linux support predates Android, so it has been enabled for a long time |
+| XNU(IOS, macos) | YES | [xnu-7195.50.7.100.1](https://github.com/apple-oss-distributions/xnu/blame/8d741a5de7ff4191bf97d57b9f54c2f6d4a15585/bsd/netinet6/nd6_rtr.c#L490) | https://theapplewiki.com/wiki/Kernel#Versions |
+| FreeBSD | [NO](https://github.com/freebsd/freebsd-src/blob/47ca5d103f229b090899379ce449af5e89faf627/sys/netinet6/nd6.c#L507) | - | Router discovery implemented in userspace "rtsold" |
+| OpenBSD | [NO](https://github.com/openbsd/src/blob/36a0e83f909d48cbb69156be916b6356c14b9ae5/sbin/slaacd/engine.c#L1555) | - | Router discovery implemented in userspace "slaacd" |
+
+## RFC 4191 in Action
+<img src="../radvd/drawing-a.svg" style="background: grey;">
+
+Imagine a set up where there are a private L2 link between the office building
+one and two. Obviously, default routers should run on each building for internet
+connection. If the number of nodes in both buildings are less than 2048(safe
+limit for ethernet switches), the routers for the private link wouldn't be
+necessary because the buildings can be put in the same L2 segment. What if there
+are more than 2048 nodes?
+
+In that case, the network will need to be segmented. In order to segment the
+networks, additional routers need to be introduced. Yes, this can lead to more
+work, more things to maintain. But it sure will be worth it.
+
+The `radvd.conf` on the private link router will look something like this.
+
+```conf
+# iface to building #1 segment
+interface eth0 {
+	AdvSendAdvert On;
+	# this tells the nodes that this router is NOT a default router
+	AdvDefaultLifetime 0;
+	MinRtrAdvInterval 30;
+	MaxRtrAdvInterval 120;
+
+	route 2001:db8:0ff1ce:2::/54 {
+		# if no further RA message is received within 1 minute,
+		# the nodes will expire this prefix
+		AdvRouteLifetime 60;
+	};
+};
+
+# iface to the private link
+interface eth1 {
+	AdvSendAdvert On;
+	# this tells the other router that this router is NOT a default router
+	AdvDefaultLifetime 0;
+	MinRtrAdvInterval 30;
+	MaxRtrAdvInterval 120;
+
+	route 2001:db8:0ff1ce:1::/54 {
+		# if no further RA message is received within 1 minute by the other
+		# router, the other router will start redirecting traffic to the default
+		# router
+		AdvRouteLifetime 60;
+	};
+};
+```
+
+The RA message will look someting like the first image. When both default and
+private link router are sending their RA messages, the routing table on the
+nodes will look similar to the second image.
+
+![The RA message will look something like this](image-2.png)
+![screenshot of route table on the nodes](image.png)
+
+### Failure of private link between the buildings
+![Contractors bored through communication cable](severed_link.webp)
+
+If there's no mac bridges(repeaters and such) and the link down(NO CARRIER)
+condition is detected directly by the routers, the routers will immediately
+start [redirecting](https://datatracker.ietf.org/doc/html/rfc4861#section-4.5)
+traffic to the default routers.
+
+In any other case, in which routers are not able to exchange RA messages or do
+neighbor discovery, the prefix in the table of respective routers will expire in
+1 minute after the incident. Then the routers will start redirecting the
+traffic.
+
+The network users(and even the network admin themselves) won't be able to notice
+anything out of ordinary. However, ICMPv6 redirect is not an efficient process.
+As the internal traffic starts following out to the internet and back, the
+failover state will put strain on the default routers and the internet backbone
+routers. The applications must not make any assumptions about the network and
+treat traffic within the organisation any different. Information leak can still
+happen so it could be a good measure to have the VPN for the internal traffic on
+the default routers as well.
+
+### Failure of the private link routers
+The nodes in the building won't be able to reach the nodes in the other building
+for maximum of 1 minute. After the route expires, the nodes will start using the
+default router to reach the other nodes.
+
+The nodes in the other building will experience the same downtime. However, when
+the prefix expires in the router expires, it will start doing ICMPv6 redirect,
+which is not efficient. To avoid this, until the problem is resolved, the L2
+link can be bypassed to put the private link on the building segment or the
+other router can be taken down so that the internal traffic is routed to the
+internet.
+
+### Internet Service Disruption
+Well, there will be no internet :(. But people will be able to use resources on
+the other building!
+
+### Multiple of Everything!
+There can be multiple routers facing the private link. They can all send RA
+messages independent of each other. This also applies to the default routers as
+well. The routing table on the nodes will look similar to this:
+
+![multiple default routes](image-3.png)
+
+Note that a node will choose one of multiple routers for the destination. Which
+one it chooses is basically random, so some level of load balancing can be
+achieved.
+
+This set up can be scaled up to many buildings and routes. It'll eventually get
+to a point where IBGP is better suited for the purpose, and also, the services
+would have to be self-hosted on premise. That's what I call a "long shot".