For the Red Hat Summit this year, I wrote a paper on the kernel-bypass technology from Solarflare, called OpenOnload. From a performance standpoint it’s hard to argue with the results.
I was looking at code from Open vSwitch recently, and it dawned on me that there is an important similarity between Open vSwitch and OpenOnload; a similar 2-phase approach…let me explain.
Both have a “connection setup” operation where many of the well-known user-space utilities come into play (and some purpose-build like ovs-vsctl)…things like adjusting routing, MTU, interface statistics etc…And then what you could call an accelerated path, that’s used after the initial connection setup for passing bits to/from user-space, whether that be a KVM process or your matching engine.
In OpenOnload’s case, the accelerated path bypasses the linux kernel, avoiding kernel-space-user-space data copies (aka context switches) and thus lowering latency. This technique is also called RDMA, has been around for decades, and there are quite a few vendors out there with analogous solutions. Often there are optimized drivers, things like OFED and a whole bunch of other tricks, but that’s beside my point…
The price paid for achieving this lower latency is having to completely give up, or entirely re-implement lots of kernel goodies like what you’d expect out of netstat, ethtool and tcpdump.
In the case of Open vSwitch, there is a software “controller” (which decides what to do with a packet) and a data-path implemented in a kernel module that provides the best performance possible once the user-defined policy has been applied via the controller. If you’re interested in Open vSwitch internals, here’s a nice presentation from Simon Horms. I think the video is definitely worth a half hour!
Anyway, what do accelerated paths and kernel-bypass boil down to ? Things like swap-over-NFS, NFS-root, proliferation of iSCSI/NFS filers and FUSE-based projects like Gluster, put network subsystem performance directly in the cross-hairs. Most importantly, demands on the networking subsystem on all operating systems are pushing the performance boundaries of what the traditional protection ring concept can provide.
Developers go to great lengths to take advantage of the ring model, however it seems faster network throughput (btw is 400gbps ethernet the next step?) and lower latency requirements are recently more at odds than ever with the ring paradigm.
Linux and BSD’s decades-old niche of being excellent routing platforms will be tested (as it always is) by these future technologies and customer demand for them. Looking forward to seeing how projects like OpenStack wire all of this stuff together!