Skip to content
WALDEMAR KOZACZUK edited this page Aug 2, 2022 · 26 revisions

The OSv networking stack originates from FreeBSD as of circa 2013 but has since been heavily modified to implement Van Jacobson's "network channels" design, to reduce the number of locks and lock operations. For more theory and high-level design details please read the "Network Channels" chapter of the OSv paper.

This Wiki instead, focuses on the code and where these design ideas are implemented. It still touches just a tip of "the iceberg" which is the code of the networking stack located mostly under the bsd/ subtree.

Studying Networking Stack

One can use trace.py to effectively study the OSv networking stack. There are numerous trace points that can be enabled when running a given app and then extracted and analyzed using the aforementioned tool as described in this wiki page.

A good testing bed might be a httpserver-monitoring-api app which can be built and started with the following trace points enabled:

./scripts/build image=httpserver-monitoring-api.fg

./scripts/run.py --trace=net_packet*,tcp*,tso*,inpcb*,in_lltable* --trace-backtrace --api

curl http://localhost:8000/os/threads # Just to trigger some networking activity

./scripts/trace.py extract
./scripts/trace.py list --tcpdump -blLF

0xffff800001981040 /libhttpserver-  0         0.117564227 tcp_state            tp=0xffffa000015b9c00, CLOSED -> CLOSED
  tcpcb::set_state(int) bsd/sys/netinet/tcp_var.h:233
  tcp_attach bsd/sys/netinet/tcp_usrreq.cc:1618
  tcp_usr_attach bsd/sys/netinet/tcp_usrreq.cc:130
  socreate bsd/sys/kern/uipc_socket.cc:335
  socreate(int, int, int) bsd/sys/kern/uipc_syscalls.cc:118
  sys_socket bsd/sys/kern/uipc_syscalls.cc:133
  linux_socket bsd/sys/compat/linux/linux_socket.cc:616
  socket bsd/sys/kern/uipc_syscalls_wrap.cc:333

Net Channels: Slow Path vs Fast Path

The net channel is a direct bottom-up traffic line flowing from a network driver acting as a producer to an app thread calling recv(), poll(), epoll() among others acting as a consumer to avoid most of the locking involved when typically traversing layer by layer. Relatedly, one can see many references in the code to both "fast path" and "slow path". To understand both and net channels, one can start looking at this code in virtio-net driver (there is a similar code in the vmxnet3 driver):

void net::receiver()
{
...
  bool fast_path = _ifn->if_classifier.post_packet(m_head);
  if (!fast_path) {
      (*_ifn->if_input)(_ifn, m_head);
  }
...
}

In essence, this code is called to process incoming data (RX) from the network card and it tries to "push" the resulting mbuf via the network channel (fast-path). If that fails it falls back to the if_input from the FreeBSD way of doing things.

The if_classifier, a member of the struct ifnet describing network interface and defined in if_var.h, is an instance of the class classifier. The method post_packet() used in the code above, is part of the 'producer' interface and its role is to identify or classify if mbuf in question has some corresponding net channel and if so push the mbuf on that net channel and wake consumers of the net channel. So the network card driver, virtio-net in this example, is a "producer" in the context of the net channel and threads blocked when calling send, recv and poll are "consumers". Also, an instance of a net channel corresponds to a single TCP connection.

Here is an example of the "successful" fast path traversal:

0xffff8000015ff040 virtio-net-rx    0        21.143180806 net_packet_in        b'IP truncated-ip - 14 bytes missing! 192.168.122.1.36394 > 192.1
68.122.15.8000: Flags [P.], seq 2688002:2688090, ack 2893961834, win 65535, length 88'
  log_packet_in(mbuf*, int) core/net_trace.cc:143
  classifier::post_packet(mbuf*) core/net_channel.cc:133
  virtio::net::receiver() drivers/virtio-net.cc:542
  std::_Function_handler<void (), virtio::net::net(virtio::virtio_device&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) drivers/virtio-net.cc:243
  __invoke_impl<void, virtio::net::net(virtio::virtio_device&)::<lambda()>&> /usr/include/c++/11/bits/invoke.h:61
  __invoke_r<void, virtio::net::net(virtio::virtio_device&)::<lambda()>&> /usr/include/c++/11/bits/invoke.h:154
  _M_invoke /usr/include/c++/11/bits/std_function.h:290
  sched::thread::main() core/sched.cc:1267
  thread_main_c arch/x64/arch-switch.hh:325
  thread_main arch/x64/entry.S:116

Now, how does the post_packet() exactly "classify" the packet? Under the hood, it calls the method classify_ipv4_tcp(), which in turn first verifies if the packet belongs in the "fast path" category meaning more-less:

  • is it an IP packet?
  • does it carry a TCP payload?
  • is the underlying TCP connection in the right state - not TH_SYN nor TH_FIN nor TH_RST.

The last condition effectively means that only sockets in the state - ESTABLISHED, CLOSE_WAIT, FIN_WAIT_2, and TIME_WAIT - would "participate" in the fast path traversal. In other words, the fast path only plays a role when a TCP connection is established and the slow path is what happens during establishing and tear-down of a TCP connection.

The post_packet() pushes an mbuf onto the net channel only if one exists. But when does a net channel get created? The net channel gets constructed by tcp_setup_net_channel() and destroyed by tcp_teardown_net_channel() or tcp_free_net_channel(). The former gets called when a TCP connection gets established in tcp_do_segment() here and there. The tcp_teardown_net_channel() gets called by tcp_do_segment() when socket in ESTABLISHED state transitions to CLOSE_WAIT one, and an established socket is closed is in tcp_usr_close() and tcp_usrclosed(). The tcp_free_net_channel() on other hand, gets called by tcp_discardcb() when the process of TCP connection closing begins in other TCP state machine cases.

The tcp_setup_net_channel() is key as it binds the "consumers" of a net channel by calling add_poller() and add_epoll. It also registers a new net channel in the RCU hashtable kept as part of the classifier.

Coming back to the original code, if the "fast path" fails when post_packet() returns false, the if_input function - "slow path" is called. Here is an example of a "slow path" execution:

0xffff800001783040 virtio-net-rx    0        19.881495336 net_packet_in        b'IP 192.168.122.1.36398 > 192.168.122.15.8000: Flags [F.], seq 2496090, ack 233281200, win 65535, length 0'
  log_packet_in(mbuf*, int) core/net_trace.cc:143
  netisr_dispatch_src bsd/sys/net/netisr.cc:768
  virtio::net::receiver() drivers/virtio-net.cc:544
  std::_Function_handler<void (), virtio::net::net(virtio::virtio_device&)::{lambda()#1}>::_M_invoke(std::_Any_data const&) drivers/virtio-net.cc:243
  __invoke_impl<void, virtio::net::net(virtio::virtio_device&)::<lambda()>&> /usr/include/c++/11/bits/invoke.h:61
  __invoke_r<void, virtio::net::net(virtio::virtio_device&)::<lambda()>&> /usr/include/c++/11/bits/invoke.h:154
  _M_invoke /usr/include/c++/11/bits/std_function.h:290
  sched::thread::main() core/sched.cc:1267
  thread_main_c arch/x64/arch-switch.hh:325
  thread_main arch/x64/entry.S:116

The netisr_dispatch_src is what the if_input member of the [struct ifnet] points to.

To conclude, fast path because it directly calls net channel rather than traversing all traditional stack call paths that involve many locks - slow path.

Top-down Direction

Bottom-up Direction

Other Components

ARP Cache

Route Cache

Clone this wiki locally