genode/repos/os/src/server/nic_router/README

The 'nic_router' component can be used to achieve a controlled mediation
between multiple NIC sessions on network or transport level. NIC sessions are
assigned to domains. The rules configured by the user then mediate between
these domains. This is a brief overview of the features thereby provided:

* Acting as hub between NIC session with the same domain,
* routing of UDP and TCP according to destination IP address and port,
* routing of ICMP and IPv4 according to destination IP address,
* port forwarding for UDP and TCP,
* NAPT for UDP, TCP and ICMP "Echo",
* forwarding of ICMP "Destination Unreachable" according to the UDP, TCP or
  ICMP "Echo" connection it refers to,
* acting as DHCP server or client per domain,
* provide per-domain network statistics via a report session,
* print out header information for each packet received or sent,
* and be fully re-configurable at runtime.


Basics
~~~~~~

The NIC router can act as server of multiple NIC session clients (downlinks)
and at the same time as client of multiple NIC session servers (uplinks).
Besides the decision which side initiates the NIC session and provides MAC
address respectively link state, uplinks and downlinks are equal to the NIC
router.

The routing algorithm is ultimately controlled through the configuration. NIC
sessions are assigned to domains. Each domain represents one subnet and a
corresponding routing configuration. The assignment of downlink NIC sessions
to domains is controlled through the policy tag that is also known from other
Genode components:

! <policy label_prefix="vlan_" domain="vlan" />
! <policy label_suffix="_server" domain="servers" />
! <policy label="nic_bridge_1" domain="wired_bridge" />
! <policy label="nic_bridge_2" domain="wired_bridge" />

The domain name can be freely choosen but must be unique.
The uplink tag instructs the NIC router to create an uplink NIC session that
is assigned to the give domain:

! <uplink               domain="wired_bridge" />
! <uplink label="wired" domain="wired_bridge" />
! <uplink label="wifi"  domain="wifi_uplink" />

The label is the session label that is used when requesting the uplink NIC
session. The label attribute is optional. It is perfectly fine to have a
domain with uplinks and downlinks assigned to at the same time. For each
domain there must be a domain tag:

! <domain name="uplink"       interface="10.0.2.55/24"    />
! <domain name="http_servers" interface="192.168.1.18/24" />
! <domain name="imap_servers" interface="192.168.2.17/24" />

The 'interface' attribute defines two things at once. First, it tells the
router which subnet can be found behind this domain, and second, which IP
identity the router shall use in case it has to communicate as itself with
the subnet. If the 'interface' attribute is not set in a 'domain' tag, the
router acts as DHCP client (Section [Configuring DHCP client functionality]).

Additionaly, the optional 'gateway' attribute can be set for a domain:

! <domain name="uplink" interface="10.0.2.55/24" gateway="10.0.2.1" />

It defines the standard gateway of the subnet behind this domain. If a packet
shall be routed to this domain and its final IP destination does not match
the subnet, its Ethernet destination is set to the MAC address of the gateway.
If a gateway isn't given for a domain, such packets get dropped. If a gateway
is given for a domain without an 'interface' attribute, this gateway
configuration is not getting effective.

For each domain, the routing of packets from this domain can be configured
individually by adding subtags to the corresponding domain tag. There are
multiple types of subtags expressing different types of routing rules. The
following table gives a brief overview over the different subtags and their
meaning:

 Subtag                     | Description
---------------------------------------------------------------
 <tcp-forward port="X" />   | Port forwarding for TCP port X *
---------------------------------------------------------------
 <udp-forward port="X" />   | Port forwarding for UDP port X *
---------------------------------------------------------------
 <tcp dst="X">              | Routing TCP packets that target
    <permit-any />          | IP range X *
 </tcp>                     |
---------------------------------------------------------------
 <udp dst="X">              | Routing UDP packets that target
    <permit-any />          | IP range X *
 </udp>                     |
---------------------------------------------------------------
 <tcp dst="X">              | Routing TCP packets that target
    <permit port="Y" />     | IP range X and port Y or Z *
    <permit port="Z" />     |
 </tcp>                     |
---------------------------------------------------------------
 <udp dst="X">              | Routing UDP packets that target
    <permit port="Y" />     | IP range X and port Y or Z *
    <permit port="Z" />     |
 </udp>                     |
---------------------------------------------------------------
 <ip dst="X" />             | Routing IP packets that target
                            | IP range X
---------------------------------------------------------------
 <icmp dst="X" />           | Routing ICMP packets that target
                            | IP range X

A detailed explanation of the different routing rules is given in the
following sections of this document. For all rules marked with a star, the
router also keeps track of corresponding TCP connections and UDP
pseudo-connections. With these so-called link states, corresponding reply
packets are automatically routed back. The user doesn't have to add an
additional back-routing rule for that.

Now having this variety of ways of routing a packet, it is absolutely legal
that for one packet the domain may contain multiple rules that are applicable.
And additionally, there may even be a link state that fits. The router's
choice, however, is always deterministic. It follows this priority scheme:

:For TCP and UDP:

1) Domain-local IP traffic
2) Link states
3) Port forwarding rules
4) Longest prefix match amongst TCP respectively UDP rules
   4.1) Subrule that permits any port
   4.2) Subrules that permit specific ports
5) Longest prefix match amongst IP rules

:For ICMP "Echo":

1) Domain-local IP traffic
2) Link states
3) Longest prefix match amongst ICMP rules
4) Longest prefix match amongst IP rules

:For ICMP "Destination Unreachable" with embedded UDP, TCP or ICMP "Echo":

1) Domain-local IP traffic
2) Link states
3) Longest prefix match amongst IP rules

:For IP with unsupported transport-layer protocol:

1) Domain-local IP traffic
2) Longest prefix match amongst IP rules


IP rules
~~~~~~~~

These are examples for IP rules:

! <ip dst="10.0.2.0/24"     domain="intranet"  />
! <ip dst="192.168.1.18/32" domain="my_server" />
! <ip dst="0.0.0.0/0"       domain="uplink"    />

IP rules only apply to IPv4 packets from the session of the surrounding
domain. The 'dst' attribute is compared with the IP destination of the packet.
The rule with the longest prefix match is taken. The packet is then routed to
the domain given in the rule.

IP rules work pretty simple. They merely affect the Ethernet header of a
packet and they don't imply link-state tracking. This has consequences. First,
IP rules do not automatically route back reply packets from the remote side.
If you like to enable bidirectional communication via IP rules, both domains
must have an appropriate rule in their domain tag. And second, IP rules do not
consider a NAT configuration (Section [Configuring NAT]). As this could lead
to unexpected leakage of local IP addresses and ports, you should use the
combination of IP rules and NAT only with great care.


ICMP rules
~~~~~~~~~~

These are examples for ICMP rules:

! <icmp dst="10.0.2.0/24"     domain="intranet"  />
! <icmp dst="192.168.1.18/32" domain="my_server" />
! <icmp dst="0.0.0.0/0"       domain="uplink"    />

ICMP rules only apply to ICMP "Echo" packets from sessions of the surrounding
domain. The 'dst' attribute is compared with the IP destination of the packet.
The rule with the longest prefix match is taken. The packet is then routed to
the domain given in the rule.

For bidirectional traffic, you'll need only one ICMP rule describing the
client-to-server direction. The server-sided domain doesn't need a rule as the
router correlates replies to the client-sided rule (and only those) via a link
state (Section [Link states]) that was created at the clients initial request.

ICMP rules consider whether the router shall apply NAT (Section [Configuring
NAT]) for the client side. If this is the case, source IP and ICMP query ID
are replaced by the router's IP identity and a free ICMP query ID at the
server-sided domain. Also the corresponding link state takes this in account
to change back the destination of the replies.

The router also forwards ICMP errors. This is described in section
[Link states].


TCP and UDP rules
~~~~~~~~~~~~~~~~~

TCP and UDP rules must always be accompanied by one or more port permission
rules to get effective:

! <tcp dst="192.168.1.18/32">
!    <permit port="70" domain="gopher_servers" />
!    <permit port="80" domain="http_servers" />
! </tcp>
! <udp dst="10.0.2.0/24">
!    <permit-any domain="uplink" />
! </udp>

TCP rules only apply to TCP packets and UDP rules only to UDP packets from the
session of the surrounding domain. The 'dst' attribute is compared with the IP
destination of the packet. The rule with the longest prefix match is taken.
If the rule contains a 'permit-any' subrule or a 'permit' subrule whose 'port'
attribute matches the destination port of the packet, the packet is routed to
the domain given in the subrule.

For bidirectional traffic, you'll need only one TCP or UDP rule describing the
client-to-server direction. The server-sided domain doesn't need a rule as the
router correlates replies to the client-sided rule (and only those) via a link
state (Section [Link states]) that was created at the clients initial request.

TCP and UDP rules consider whether the router shall apply NAT
(Section [Configuring NAT]) for the client side. If this is the case, source
IP and port are replaced by the router's IP identity and a free port at the
server-sided domain. Also the corresponding link state takes this in account
to change back the destination of the replies.


Port-forwarding rules
~~~~~~~~~~~~~~~~~~~~~

These are examples for port-forwarding rules:

! <tcp-forward port="80" domain="http_servers" to="192.168.1.18" />
! <udp-forward port="69" domain="tftp_servers" to="192.168.2.23" />

Port-forwarding rules only apply to packets that come from the session of the
surrounding domain and are addressed to the router's IP identity at this
domain (Section [Basics]). Amongst those, 'tcp-forward' rules only apply to
the TCP packets and 'udp-forward' rules only to the UDP packets. The 'port'
attribute is compared with the packet's destination port. If a matching rule
is found, the IP destination of the packet is changed to the value of the 'to'
attribute. Then, the packet is routed to the domain given in the rule. Note
that the router accepts only system and registered ports (0 to 49151) for port
forwarding.

For bidirectional traffic, you'll need only one port-forwarding rule
describing the client-to-server direction. The server-sided domain doesn't
need a rule as the router correlates replies to the client-sided rule (and
only those) via a link state (Section [Link states]) that was created at the
clients initial request.

It's in the nature of port forwarding that it comes along with NAT for the
server side. However, the router only translates the server IP. The port
remains unchanged. For the client side, port-forwarding rules apply NAT only
when configured (Section [Configuring NAT]). If this is the case, client IP
and port are translated.


Link states
~~~~~~~~~~~

Each time a packet gets routed by using a TCP, UDP, ICMP or port-forwarding
rule, the router creates a link state. From then on, all packets that belong
to the exchange this first packet initiated and come from one of the two
involved domains are routed by the link state and not by a rule. The costs for
the link state are paid by the session that sent the first packet.

If a link state exists for a packet, it is unambiguously correlated either
through source IP and port plus destination IP and port or, for ICMP, through
source and destination IP plus ICMP query ID. This is also the case if the
transfer includes NAT no matter of what kind or for which side.

It is desirable to discard a link state as soon as it is not needed anymore.
The more precise this is done, the more efficient can NIC sessions use their
resources (ports, RAM), and the less is the risk for DoS attacks. Therefore,
the NIC router keeps track of the idle time of a link. Idle time means the
time passed since the last packet was routed using that link regardless of
the direction or content of that packet. The amount of idle time at which
the NIC router shall discard a link state can be configured in the <config>
tag of the router for each link type separately:

! <config udp_idle_timeout_sec="30"
!         tcp_idle_timeout_sec="50"
!         icmp_idle_timeout_sec="5">

This would set the maximum ICMP idle time to 5, the maximum UDP idle time to
30 and the maximum TCP idle time to 50 seconds. You should choose these values
with care. If they are too low, replies that normally need no routing rule may
get lost. If it is too high, link states are held longer than necessary.

For UDP and ICMP link states, this timeout is the only condition that leads to
a discard. This is better known as hole punching. It allows peers to keep
alive a UDP or ICMP pseudo-connection through the router by frequently sending
empty packets. The need for such a pseudo-connection arises from the router's
demand to support NAT for UDP and ICMP transfers and the consequence of
keeping the corresponding mapping information.

The lifetime management of TCP link states, in contrast, is more complex. In
addition to the common timeout, they may be discarded also after the router
observed the four-way termination handshake of TCP plus a duration of two
times the maximum segment lifetime. The maximum segment lifetime can be be set
in the <config> tag too:

! <config tcp_max_segm_lifetime_sec="20">

As long as there is a link state for a connection, the router also forwards
ICMP "Destination Unreachable" packets that contain a packet of this
connection embedded in their payload. The embedded packet is adapted according
to the NAT configuration of the link state as well as the outer IPv4 packet
that contains the ICMP.


Configuring NAT
~~~~~~~~~~~~~~~

In contrast to routing rules that affect packets coming from their domain,
NAT rules affect packets that go to their domain:

! <domain name="uplink" interface="10.0.2.55/24">
!    <nat domain="http_client" tcp-ports="6" />
! </domain>

This would tell the router to apply NAT for the HTTP client when it speaks to
the uplink. This means, it affects all packets from the HTTP client that get
routed to the uplink by using a UDP, TCP, or port-forwarding rule respectively
a corresponding link state. If this is the case, the packet's source IP
address is changed to "10.0.2.55" and the source port is replaced by a free
source port of the router. When saying "free source port" this actually means
a port that the router currently doesn't use at the destination domain. So,
at each domain, the router has two complete port spaces for source NAT
available. One for UDP and one for TCP. Each port space contains the IANA
dynamic port range 49152 to 65535.

As you can see, the NAT rule also has a 'tcp-ports' attribute. It restricts
how many TCP source ports of the uplink the HTTP client may use at a time. The
same goes also for UDP:

! <nat domain="tftp_client" udp-ports="13" />

And even combined:

! <nat domain="intranet" tcp-ports="43" udp-ports="21" />

The same goes for ICMP query IDs:

! <nat domain="intranet" tcp-ports="43" udp-ports="21" icmp-ids="102" />

If one of the port or ID attributes is not set, this means that no port or ID
shall be used for this protocol which effectively disables it. Thus, at least
one of these attributes must be set for the NAT rule to be sensible.
Restricting the port usage is necessary to avoid that a client can run
Denial-of-Service attacks against the destination domain by occupying all of
its ports or IDs.


Configuring DHCP server functionality
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

One can configure the NIC router to act as DHCP server at interfaces of a
domain by adding the <dhcp> tag to the configuration of the domain like
this:

<domain name="vbox" interface="10.0.1.1/24">
    <dhcp-server ip_first="10.0.1.80"
                 ip_last="10.0.1.100"
                 ip_lease_time_sec="3600"
                 dns_server="10.0.0.2"
                 dns_server_from="uplink" />
    ...
</domain>

The attributes ip_first and ip_last define the available IPv4 address range
while ip_lease_time_sec defines the lifetime of an IPv4 address assignment in
seconds. The IPv4 address range must be in the subnet defined by the interface
attribute of the domain tag and must not cover the IPv4 address in this
attribute. The dns_server attribute gives the IPv4 address of the DNS server
that might also be in another subnet. The dns_server_from attribute has effect
only if the dns_server attribute is not set. If this is the case, the
dns_server_from attribute states the domain from whose IP config to take the
DNS server address. This is useful, for instance, if the stated domain
receives the address of a local DNS server via DHCP. Whenever the IP config
of the stated domain becomes invalid, the DHCP server switches to a mode where
it drops all requests unanswered until the IP config becomes valid again.

The lifetime of an assignment that was yet only offered to the client can be
configured for all domains in the <config> tag of the router:

! <config dhcp_offer_timeout_sec="6">

The timeout ip_lease_time_sec is applied only when the offer is acknowledged
by the client in time.


Configuring DHCP client functionality
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

If the attribute 'interface' is not set in a 'domain' tag, the router tries to
dynamically receive and maintain an IP configuration for that domain by using
DHCP in the client role at all interfaces that connect to the domain. In the
DHCP discover phase, the router simply chooses the first DHCP offer that
arrives. So, no comparison of different DHCP offers is done. In the DHCP
request phase, the server is expected to provide an IP address, a gateway, a
subnet mask, and an IP lease time to the router. If anything substantial goes
wrong during a DHCP exchange, the router discards the outcome of the exchange
and goes back to the DHCP discover phase. At any time where there is no valid
IP configuration present at a domain, the domain does only act as DHCP client
and all other router functionality is disabled for the domain. A domain cannot
act as DHCP client and DHCP server at once. So, a 'domain' tag must either
have an 'interface' attribute or must not contain a 'dhcp-server' tag.

The timeouts when waiting for the reply of DHCP discover messages and for DHCP
request messages can be configured for all domains in the <config> tag of the
router:

! <config dhcp_discover_timeout_sec="10"
!         dhcp_request_timeout_sec="6">


Configuring reporting functionality
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The NIC router can be configured to send reports about its state.

Configuration example (shows default values of attributes):

<config>
    <report interval_sec="5" bytes="yes" config="yes" config_triggers="no">
</config>

If the 'report' tag is not available, no reports are send.
The attributes of the 'report' tag:

'bytes'           : Boolean : Whether to report sent bytes and received bytes
                              per domain
'config'          : Boolean : Whether to report IPv4 interface and gateway per
                              domain
'config_triggers' : Boolean : Wether to force a report each time the IPv4
                              config changes
'interval_sec'    : 1..3600 : Interval of sending reports in seconds


Verbosity
~~~~~~~~~

This is how you can configure the routers verbosity on its environment LOG
session:

! <config verbose="yes">

Log router decisions and optional hints.

! <config verbose_packets="yes">

Log most important protocol header fields of each packet that is received or
sent by the router (ETH, IPv4, ARP, UDP, TCP, DHCP, ICMP).

! <config verbose_domain_state="yes">

Log most important changes in the state of a domain (number of nic sessions
connected, current IPv4 config).

! <config>
!     <domain verbose_packets="yes" ... >
! <config/>

Log most important protocol header fields of each packet that is received or
sent at a specific domain (ETH, IPv4, ARP, UDP, TCP, DHCP, ICMP).


Examples
~~~~~~~~

This section will list and explain some interesting configuration snippets. A
comprehensive example of how to use the router (except DHCP server
functionality) can be found in the test script 'libports/run/nic_router.run'.
For an example of how to use the DHCP server and the DHCP client functionality
see the 'ports/run/virtualbox_nic_router.run' script.

The environment for the examples shall be as
follows. There are two virtual subnets 192.168.1.0/24 and 192.168.2.0/24 that
connect as Virtnet A and B to the router. The standard gateway of the virtual
networks is the NIC router with IP 192.168.*.1 . The router's uplink leads to
the NIC driver that connects the machine with your home network 10.0.2.0/24.
Your home network is connected to the internet through its standard gateway
10.0.2.1 .


Connecting local networks
-------------------------

Let's assume we simply want the virtual networks and the home network to be
able to talk to each other. Furthermore, the virtual networks shall be able to
use the internet connection of your home network. The router would have the
following configuration:

! <policy label_prefix="virtnet_a" domain="virtnet_a" />
! <policy label_prefix="virtnet_b" domain="virtnet_b" />
! <uplink                          domain="uplink"    />
!
! <domain name="uplink" interface="10.0.2.55/24" gateway="10.0.2.1/24">
!    <ip dst="192.168.1.0/24" domain="virtnet_a"/>
!    <ip dst="192.168.2.0/24" domain="virtnet_b"/>
! </domain>
!
! <domain name="virtnet_a" interface="192.168.1.1/24">
!    <ip dst="192.168.2.0/24" domain="virtnet_b"/>
!    <ip dst="0.0.0.0/0"      domain="uplink"/>
! </domain>
!
! <domain name="virtnet_b" interface="192.168.2.1/24">
!    <ip dst="192.168.1.0/24" domain="virtnet_a"/>
!    <ip dst="0.0.0.0/0"      domain="uplink"/>
! </domain>

IP packets from Virtnet A and uplink that target an IP address 192.168.2.* are
routed to Virtnet B. IP packets from Virtnet B and uplink that target an IP
address 192.168.1.* are routed to Virtnet A. Packets that are addressed to
hosts in the same local network should never reach the router as they can be
transmitted directly. If there's a packet from one of the virtual networks
that doesn't target 192.168.1.* or 192.168.2.*, the IP 0.0.0.0/0 rules route
them to the uplink. If these packets target an IP 10.0.2.*, the router sends
them directly to the host in your home network. Otherwise, the router sends
them to your gateway 10.0.2.1 . Note that none of the packets is modified on
layer 2 or higher, so, no NAT is done by the router to hide the virtual
networks.


Clients in a private network
----------------------------

Now we have some clients in Virtnet A that like to talk to the internet as
well as to the home network. We want them to be hidden via NAT when they do so
and to be limited to HTTP+TLS/SSL and IMAP+TLS/SSL when talking to the
internet. The router would have the following configuration:

! <policy label_prefix="virtnet_a" domain="virtnet_a" />
! <policy label_prefix="virtnet_b" domain="virtnet_b" />
! <uplink                          domain="uplink"    />
!
! <domain name="uplink" interface="10.0.2.55/24" gateway="10.0.2.1/24">
!    <nat domain="virtnet_a" tcp_ports="1000" udp_ports="1000">
! </domain>
!
! <domain name="virtnet_a" interface="192.168.1.1/24">
!    <tcp dst="10.0.2.0/24"><permit-any domain="uplink" /></tcp>
!    <udp dst="10.0.2.0/24"><permit-any domain="uplink" /></udp>
!    <tcp dst="0.0.0.0/0">
!       <permit port="443" domain="uplink" />
!       <permit port="993" domain="uplink" />
!    </tcp>
! </domain>

From the packets that come from Virtnet A, those that target an IP 10.0.2.*
are routed to the uplink without inspecting the port. At the uplink, the
router notices that it shall apply NAT for Virtnet A. It replaces the source
IP with 10.0.2.55 and allocates one of its uplink source ports for the
exchange. On replies to Virtnet-A packets from the home network, the router
translates IP and port back using the corresponding link state. For packets
from Virtnet A that target other IPs, only the 0.0.0.0/0 rule applies and only
if the packet targets TCP port 443 or 993. Both ports route the packet to the
uplink where, again, NAT is applied and the packets are sent to the gateway
10.0.2.1 .


Servers in a private network
----------------------------

In this example, we assume that there are three servers in Virtnet A. An HTTP
server at port 80 with IP 192.168.1.2, a GOPHER server at port 70 with IP
192.168.1.3, and a TFTP server at port 69 with IP 192.168.1.4 . Now you want
the servers (and only them) to be reachable to the home network via the
router's IP and to the internet via your gateway. The router would have the
following configuration:

! <policy label_prefix="virtnet_a" domain="virtnet_a" />
! <policy label_prefix="virtnet_b" domain="virtnet_b" />
! <uplink                          domain="uplink"    />
!
! <domain name="uplink" interface="10.0.2.55/24" gateway="10.0.2.1">
!    <tcp-forward port="80" domain="virtnet_a" to="192.168.1.2" />
!    <tcp-forward port="70" domain="virtnet_a" to="192.168.1.3" />
!    <udp-forward port="69" domain="virtnet_a" to="192.168.1.4" />
! </domain>
!
! <domain name="virtnet_a" interface="192.168.1.1/24" />
! <domain name="virtnet_b" interface="192.168.1.1/24" />

Amongst the packets that come from the uplink, only those that are addressed
to 10.0.2.55 and TCP port 80, TCP port 70, or UDP port 69 are forwarded.
All these packets are forwarded to Virtnet A. But beforehand, their IP
destination is adapted. TCP-port-80 packets are redirected to 192.168.1.2,
TCP-port-70 packets to 192.168.1.3, and UDP-port-69 packets to 192.168.1.4.

Amongst the packets that come from Virtnet A, only those that match a link
state at the uplink are forwarded, because the Virtnet-A domain contains no
rules. Thus, Virtnet A can only talk to the uplink in the context of
TCP-connections or UDP pseudo-connections that were opened by clients behind
the uplink. The servers IP addresses never leave Virtnet A.