Archive for the ‘Programming’ Category
Network Operating System?
I’ve just begun dealing with Software Defined Networks (SDN) for my Master’s thesis, and I’m experimenting on top of Floodlight, an open source OpenFlow controller from Big Switch Networks. In OpenFlow, a logically centralised entity known as the controller can control the forwarding tables of a bunch of switches which speak OpenFlow. OpenFlow applications then talk to the controller using some controller-specific API to ‘program’ the network (manipulate forwarding tables on the switches). The high level architecture looks something like this:
Just like an operating system abstracts away the complexities of the underlying hardware for a user-space application, the controller abstracts away the complexities of the network for OpenFlow applications. For this reason, the controller is often referred to as a “network operating system”. Applications have some API to talk to the network-OS, and it translates those APIs into OpenFlow commands that control the switches.
For my thesis, the plan for my architecture was to have two applications that provide different services to the network, that are expected to run simultaneously. Both of them collect information from the OpenFlow switches and some other framework specific agents situated at the edges of the network to make some optimisation type decisions. But as soon as I implemented one of the applications, it was clear that I had no straightforward way of ensuring that both my applications wouldn’t make decisions that counteract each other. Although I really don’t like the idea of doing this, the easiest way to solve this is to wrap both applications into one. And from the looks of it, this is a problem that hasn’t been solved yet.
Controllers like NOX and Onix make the assumption that only one OpenFlow application is running on a given network at any point of time. This is a reasonable assumption from a systems perspective. But what’s gotten me confused is how OpenFlow applications fit into the “SDN for enterprises” picture. I was under the impression that a network operator using a particular controller could choose between different 3rd party OpenFlow applications to handle different complexities with the network: a load balancing application from vendor A for the edge, a routing daemon application from vendor B, and so forth. While these are relatively orthogonal applications, it looks like it’s possible for two OpenFlow applications to make decisions and choices that adversely affect each other (leading to oscillations in switch state). Floodlight allows you to run multiple applications at the same time, but leaves it to the developer (or user?) to ensure that applications can safely co-exist with each other.
So again, if my observation isn’t mistaken, how do OpenFlow applications fit cleanly into the SDN ecosystem? How can I manage my network using building blocks of applications from different vendors? Will I need to rely on OneBigApplianceFromBigBadVendor per network? Does this necessitate something analogous to per-process resource allocation as in traditional operating systems? I can see that FlowVisor style slicing is one way to go about it, but will that suffice?
So what *should* the network operating system do here? Let the applications run wild and fight it out? Or provide some mechanism to enforce policies between applications?
If I am indeed mistaken in my assumption, please do let me know what I’m missing here!
tcp_probe no workey?
tcp_probe is a very handy utility in Linux to observe TCP flows. Implemented as a Linux kernel module, all you have to do is:
$: modprobe tcp_probe [port=] $: cat /proc/net/tcpprobe > logfile.out
…and you’ll get clear statistics about whatever TCP flows go through ‘port’ (and of all TCP flows if you specify the port as 0).
Earlier today, I ran into a situation where tcp_probe would stop logging flows after a couple of seconds, and it always seemed to be around the 10th second in an experiment I was trying (which involved iperf-ing over a wireless link). Some quick searches made it clear that others were encountering it as well, but no one really had a solution for them. Odd.
And what do you do when Google can’t help you find the answer? You look through the source code of course!
Within a few seconds of going through tcp_probe.c, I had my answer before me. Have a look at lines 47-49 and you’ll know what was wrong in my case.
static int full __read_mostly; MODULE_PARM_DESC(full, "Full log (1=every ack packet received, 0=only cwnd changes)"); module_param(full, int, 0);
In short, I was on a wireless channel without any nodes apart from my wireless interface and the access point, and my congestion window size was getting maxed out around the 10th second. Since tcpprobe by default logs a packet only if the congestion window size changes, I couldn’t see any more packets of the flow I was looking at in /proc/net/tcpprobe.
So the solution?
$: modprobe tcp_probe <args> full=1
Note that you might want to look at the log buffer size parameter (bufsize) as well, because tcp_probe happily ignores packets once your log buffer is filled.
