• Magic Numbers in Distributed Systems

    I'm officially done with the first half of my masters as of now, and it's been a fun swim so far in the violent sea of distributed systems, which by the way, is a really big zoo where anything can go wrong. To quote Leslie Lamport:

    A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable"

    Thus, researchers over the last few decades have gone through great lengths to design distributed algorithms that ensure correct behaviour in the light of node failures and random activity by other processes. But if you're a student and you're going through these algorithms, it's not entirely obvious what some of the magic numbers mean at times, and what the intuition is behind them that ensures that an algorithm works correctly. This post is meant to help in that direction. As per distributed systems convention, N refers to the number of processes in the system and F refers to the number of failures that can be tolerated.

    • \(N/2\)

      This is a simple majority "quorum". There are many algorithms that exploit the fact that two quorums overlap in at least one process. Let's take an example to illustrate this. Consider a system with N processes, serving as a distributed replicated database. There is a single process writing to the database and there can be more than one reader. One approach to doing this would be to have the writing process write to at least \(\lceil N/2\rceil + 1\) replicas. If the reading process reads from at least \(\lceil N/2\rceil + 1\) nodes, there will be at least one replica which has seen the latest write. This can used in scenarios where only less than half the nodes can fail. Some examples include Majority ACK Uniform Reliable Broadcast, Majority Voting Regular Register, and Read Impose Write Majority (1,N) Atomic Registers.

    • \((N+F)/2\)

      Referred to as "Byzantine Quorum", this one's a bread and butter number for many Byzantine Fault Tolerant algorithms, which are algorithms that tolerate arbitrary failures. This means that the distributed algorithm produces a correct result in spite of having processes that either do not respond, send out garbage values and so forth. The behaviour can be either due to faulty hardware, software, or due to control by a malicious user. The essence of the number itself is that any two sets of size \(\lceil(N+F)/2\rceil + 1\) overlap in at least one correct process. This is valid only when \(N > 3F\) (you can learn the "why" behind this from the Byzantine Generals Problem). Example algorithms: Authenticated Data Byzantine Quorum based Regular Register and Byzantine Randomised Consensus.

    • \( > (N+2F)/2\)

      This is known as a Byzantine Masking Quorum. The intuition here is that any two quorums of size \(\lceil (N+2F)/2 \rceil+ 1\) will overlap in \(2F + 1\) processes, thus ensuring that you have a worst case split up of \(F+1\) correct processes and \(F\) faulty ones. Note that \(N > 4F\) is necessary for this condition to be true. This is particularly useful in the context of Byzantine Registers, and can be seen in algorithms that implement Byzantine Safe Registers. Let's take the example of our distributed replicated database that we'd discussed above, but with byzantine faults.  The writing node attempts to write to all nodes and returns from the write upon receiving ACKs from \(\lceil (N+2F)/2\rceil + 1\) replicas. A node that needs to read from the replicated servers reads from any \(\lceil (N+2F)/2\rceil + 1\) nodes, and only picks a value if it has more than \(F\) occurrences in the set that was read (the Byzantine process can return garbage values upon the read request). The \(\lceil (N+2F)/2\rceil + 1\) guarantees that there would be at least \(F+1\) correct processes in such a set so that reads that are not concurrent with a write will for sure give you \(F+1\) correct and up-to-date values.

    • \(F\)

      This one is pretty straightforward, and you begin seeing this as soon as you're dealing with algorithms wherein processes crash. If you receive a message from at least \(F+1\) processes, then you're sure that you've received the message from at least one correct process (a process that will not fail).

    • \(2F\)

      Follows from the previous one, this ensures that if you receive \(2F + 1\) messages, you're sure that more than one correct processes is part of this set. This is particularly seen in Byzantine Algorithms where making a decision based on \(> F\) is risky. For instance, consider situations where uniform agreement is required. One such example is that of Leader Election in a distributed system. For correct functioning of the system, it is entirely necessary that all nodes have the same view of who the leader is. The algorithm fails if two nodes have picked two different nodes as their respective leaders, and in the presence of byzantine faults, this is quite possible. Consider the Rotating Byzantine Leader Detection algorithm wherein nodes can choose to "complain" about the current leader under the suspicion of being byzantine. A node decides that it should shift to the next leader when it hears at least \(> 2F\) complaints. Here's why we can't use \(> F\) alone over here. Let's assume our system should tolerate one byzantine fault \((F = 1)\). Assume, that a process P incorrectly suspects the current leader (who is legitimate) of being byzantine and broadcasts a "complaint" message. Now a byzantine node in the same system can go against the protocol, and send process P a "complaint" message, at which point process P now has \(> F (2 > 1)\) complaints about the current leader, causing it to change leaders. Since the byzantine process sent the complaint message only to process P and not the other processes in the system (who didn't find anything strange about the leader in order to complain), we now have process P with a leader \(L_{new}\) and the remaining processes at leader \(L_{old}\) such that \(L_{new} != L_{old}\). Byzantine process wins. On the other hand, if you rely on the \(2F + 1\) magic number, you're sure that there are at least \(F+1\) correct entries in your set.

    • \(N-F\)

      In algorithms where faulty nodes have the liberty of "not responding" (crash-stop processes for instance), \(N-F\) is the lowest number of responses you can hope to get if you ping all processes in the system. In byzantine scenarios, a set of \(N-F\) processes contains at least \(N-2F\) correct processes (\(F\) responses out of \(N-F\) can be corrupt). This number can be seen in many algorithms which require uniform agreement, one example being Byzantine Randomized Consensus.

    That's it for now.

  • HOWTO: Getting started with ns-3-click – Part II

    We now move into part II of the ns-3-click tutorial series wherein I'll walk you through an ns-3 script and explain how to load your Click scripts onto ns-3 nodes. This tutorial goes through the following steps:

    1. Walkthrough of an example Click graph from ns-3-click.
    2. Using the Click graph in an ns-3 simulation.


    Walkthrough of an example Click graph from ns-3-click

    The Click script I'll be explaining below is available in current ns-3-dev and stable releases from ns-3.11 onwards. You can find it in src/click/examples/nsclick-lan-single-interface.click.

    When working with ns-3-click, you will need to handle all layer 3 functionalities expected of a networked node from Click. The Click script that we'll be talking about here provides exactly this. It handles ARP, and forwards packets up and down the stack as required. It has a single network interface (eth0) to send and receive packets, and has a kernel interface (tap0) to send/receive packets from the kernel (in ns-3, this corresponds to communicating with layer 4).

    We first describe the interface to tap0 with the following functionalities:

    1. All packets received from tap0 should be forwarded down to the stack.
    2. All packets received from below the stack, destined to us, should be sent up the stack via tap0.


    elementclass TapSimHost {

    $dev |

    // Packets go to tap0, which sends them to the kernel


    -> ToDump(tokernel.pcap,2000,IP,PER_NODE 1)

    -> ToSimDevice($dev,IP);

    // Packets sent out by the kernel get pushed outside


    -> CheckIPHeader2

    -> ToDump(fromkernel.pcap,2000,IP,PER_NODE 1)

    -> GetIPAddress(16)

    -> [0]output;




    The above snippet does exactly what we've described so far. We create an element class which has a single input and output. Packets received on the input are plumbed to tap0 (because $dev is now tap0, as per the instantiation in the last line). Packets received from tap0 are pushed outside. We now describe a LAN host, which will handle ARP, and check the destination IP to see if we should receive the packet.


    elementclass LanSimHost {

    $ipaddr, $hwaddr |

    cl::Classifier(12/0806 20/0001,12/0806 20/0002, -);

    forhost::IPClassifier(dst host $ipaddr,-);




    -> ToDump(out_eth0.pcap,PER_NODE 1)

    -> ToSimDevice(eth0);

    // All packets received on eth0 are silently

    // dropped if they are destined for another location


    -> ToDump(in_eth0.pcap,PER_NODE 1,ENCAP ETHER)

    -> cl;

    // ARP queries from other nodes go to the ARP responder element

    cl[0] -> arpresponder;

    // ARP responses go to our ARP query element

    cl[1] -> [1]arpquerier;

    // All other packets get checked whether they are meant for us


    -> Strip(14)

    -> CheckIPHeader2

    -> MarkIPHeader

    -> GetIPAddress(16) // Sets destination IP address annotation

    -> forhost;

    // Packets for us are pushed outside


    -> [0]output;

    // Packets for other folks or broadcast

    // packets get sent to output 1


    -> ToDump(discard.pcap,2000,PER_NODE 1,ENCAP IP)

    -> [1]output;

    // Incoming packets get pushed into the ARP query module


    -> arpquerier;

    // Both the ARP query and response modules send data out to

    // the simulated network device, eth0.


    -> ToDump(out_arpquery.pcap,PER_NODE 1)

    -> ethout;


    -> ToDump(out_arprespond.pcap,PER_NODE 1)

    -> ethout;





    Now that we have a LanSimHost type ready, and instantiated (in the last line indicated above), we perform the final plumbing required to connect our LanSimHost to our kernel tap device:


    // Users can do some processing between the two elements

    lan[0] -> kernel;

    kernel -> lan;

    // Packets for others or broadcasts are discarded

    lan[1] -> Discard;


    This concludes our description of the nsclick-lan-single-interface.click file. Let's now describe an ns-3 simulation script for our scenario.


    Using the Click graph in an ns-3 script

    I'll now describe a simple ns-3 script which makes use of the above described Click graph. Not surprisingly, the script is named nsclick-simple-lan.cc and can be found within src/click/examples/. The simulation scenario is a simple one: two nodes A and B connected via a CSMA channel, with A sending B a stream of packets using a TCP connection. A is Click based, whereas B is a normal ns-3 node.

    The first step would be to create the nodes.

    [sourcecode language="cpp"]

    NodeContainer csmaNodes;

    csmaNodes.Create (2);


    Next, we create a CSMA channel.

    [sourcecode language="cpp"]

    CsmaHelper csma;

    csma.SetChannelAttribute ("DataRate", DataRateValue (DataRate (5000000)));

    csma.SetChannelAttribute ("Delay", TimeValue (MilliSeconds (2)));

    NetDeviceContainer csmaDevices = csma.Install (csmaNodes);


    We then install a normal internet stack on node B.

    [sourcecode language="cpp"]

    InternetStackHelper internet;

    internet.Install (csmaNodes.Get (1));


    And then setup a Click based internet stack on node A. We need to specify the Click script that the particular node is supposed to use, and in the event that we require a Click based node to run a traffic generator on top, we need to specify a routing table element for the node to use. This can be seen by the name "rt" at the end of the nsclick-lan-single-interface.click file.

    [sourcecode language="cpp"]

    ClickInternetStackHelper clickinternet;

    clickinternet.SetClickFile (csmaNodes.Get (0), "src/click/examples/nsclick-lan-single-interface.click");

    clickinternet.SetRoutingTableElement (csmaNodes.Get (0), "rt");

    clickinternet.Install (csmaNodes.Get (0));


    Now that all the nodes have an internet stack, we assign IPv4 addresses to all the network interfaces.

    [sourcecode language="cpp"]

    Ipv4AddressHelper ipv4;

    ipv4.SetBase ("", "");

    ipv4.Assign (csmaDevices);


    And then, we setup the traffic generators for talking between node A and node B.

    [sourcecode language="cpp"]

    Address LocalAddress (InetSocketAddress (Ipv4Address::GetAny (), 50000));

    PacketSinkHelper packetSinkHelper ("ns3::TcpSocketFactory", LocalAddress);

    ApplicationContainer recvapp = packetSinkHelper.Install (csmaNodes.Get (1));

    recvapp.Start (Seconds (5.0));

    recvapp.Stop (Seconds (10.0));

    OnOffHelper onOffHelper ("ns3::TcpSocketFactory", Address ());

    onOffHelper.SetAttribute ("OnTime", RandomVariableValue (ConstantVariable (1)));

    onOffHelper.SetAttribute ("OffTime", RandomVariableValue (ConstantVariable (0)));

    ApplicationContainer appcont;

    addressValue remoteAddress (InetSocketAddress (Ipv4Address (""), 50000));

    onOffHelper.SetAttribute ("Remote", remoteAddress);

    appcont.Add (onOffHelper.Install (csmaNodes.Get (0)));

    appcont.Start (Seconds (5.0));

    appcont.Stop (Seconds (10.0));


    Lastly, we enable PCAP tracing on all the CSMA NetDevices in the scenario.

    [sourcecode language="cpp"]

    csma.EnablePcap ("nsclick-simple-lan", csmaDevices, false);


    And to conclude the script, we specify the running time for the simulation to be 20 seconds, and call Simulator::Run(). Don't forget to call Simulator::Destroy() lest tools like Valgrind start screaming about memory leaks.

    [sourcecode language="cpp"]

    Simulator::Stop (Seconds (20.0));

    Simulator::Run ();

    Simulator::Destroy ();

    return 0;



    To see the results of running our simulation, execute the below in your terminal from the ns-3-dev top level directory once you've built Click as described in the previous article.

    $: ./waf --run nsclick-simple-lan

    Have a look at the resulting pcap traces (nsclick-simple-lan-0-[0,1].pcap) using wireshark or tcpdump to see what happened through the simulation.

    Hope you found this little walkthrough helpful. If you find any bugs with ns-3-click, please don't hesitate to file a bug report on our bugzilla. :)

  • HOWTO: Getting started with ns-3-click - Part I

    Since its development, I've been seeing a lot of requests for an ns-3-click 101. So with no further ado, here's the first in a series of tutorials to help you get an idea of how to go about using ns-3-click. In this article, I'll provide an idea of what ns-3-click is and how to install it .

    1. What is ns-3-click?

    ns-3-click, or the NS-3 Click Integration, is a feature of the ns-3 tool which allows a user to use a Click Modular Router instance to handle an ns-3 node's layer 3 functionality. Click is an architecture for designing highly flexible router configurations. Basically, it offers a large number of fine grained packet processing units called "elements", which can be put connected in various combinations into a Click graph. A Click graph defines a particular configuration for a router. The motivation for bringing Click into ns-3 is simple. Click users get to test their Click graphs in a powerful simulation environment which ns-3 offers, and ns-3 benefits from the large feature set that Click brings with it in the form of elements. In ns-3-click, the design choice was to entirely delegate ns-3's layer 3 functionality to Click. This means that an ns-3 node running a Click router will now have to use Click's implementations of ARP, routing tables and so forth.

    So before you get started with ns-3-click, I suggest going through [0] and [1]. The first paper should tell you everything you need to know about what Click is, and the second one will inform you about where Click fits into ns-3. The latter will help you understand what ns-3-click can or cannot do for you.

    2. Installation

    By now, you know what ns-3-click is, and its time to get your hands dirty. The first step is to download and build Click.

    $: git clone git://read.cs.ucla.edu/git/click DIR

    $: cd DIR

    $: ./configure --enable-nsclick --enable-userlevel

    Note: If you require additional modules, enable them as required using the --enable-<module> flag.

    Now build Click:

    $: make

    Once this is complete, we need to build ns-3 and point it to the Click source we've just compiled. So let's proceed to fetch and build ns-3. Note that ns-3-click was merged post ns-3.10, and will be released with ns-3.11. It is currently available in ns-3-dev.

    $: hg clone http://code.nsnam.org/ns-3-dev

    $: cd ns-3-dev

    $: ./waf configure --with-nsclick=<path-to-click-source> --enable-examples

    Once the last command has finished executing, you should see a list of features/modules that have been enabled. At this point, you should see the following line:

    NS-3 Click Integration        : enabled

    Now, let's build ns-3:

    $: ./waf build

    If all goes well, ns-3 should now be built successfully. To test whether things went well, try running one of the example scripts.

    $: ./waf --run nsclick-simple-lan

    This should generate some PCAP traces (nsclick-simple-lan-*.pcap). If you see packets being exchanged in there, then you're good to go!

    This concludes part I of the ns-3-click tutorial. In the next part, I'll provide a code walk through of a simple example script that uses ns-3-click. If you find any bugs with ns-3-click, please don't hesitate to file a bug report on our bugzilla. :)


    [0] Eddie Kohler, Robert Morris, Benjie Chen, John Jannotti, and M. Frans Kaashoek, "The Click Modular Router". ACM Transactions on Computer Systems 18(3), August 2000, pages 263-297. (Paper from the MIT-PDOS page)

    [1] Lalith Suresh P., Ruben Merz, ”NS-3-Click: Click Modular Router Integration for NS-3”. In Proc. of 3rd International ICST Workshop on NS-3 (WNS3), Barcelona, Spain. March, 2011. paper

  • NS-3 Summer of Code 2011: Results Announced!

    The wait is over, and we hereby announce the students selected for the NS-3 Summer of Code 2011 programme!
    • Ashwin Narayan with "Click-MAC extensions for ns-3-click". Mentored by Ruben Merz and myself.
    • Pankaj Gupta with "LTE-RRC extensions". Mentored by Giuseppe Piro and Francesco Capozzi.
    • Atishay Jain with "IPv6 Global Routing". Mentored by Tom Henderson and Mitch Watrous.
    Congratulations to all the students. Hoping to see a lot of good code come out of this effort and most importantly, some long term contributors to the ns-3 project itself. :)
  • Bavarian Vacation

    Say you're a Masters student studying distributed systems, and you just completed 3 major project checkpoints, and wrapped up some work with a research project. So what do you do during the Easter break that immediately follows the deadlines? You go on vacation of course! So I just got back from a wonderful four day holiday split across the dreamy fields of Schwangau and the Bavarian captial, Munich. Schwangau is a must see for anyone who wants to swim in a sea of breathtaking views and experience the European countryside. It is the home to the famous Neuschwanstein castle, which was the main inspiration for Disney's Sleeping Beauty's castle. The village of Schwangau itself lies under the protective watch of the Alps, running through the border between Germany and Austria. I stayed with a good friend of mine, and had a wonderful Bavarian Easter with her family, sharing stories of our cultures and lifestyles. On Easter Sunday itself, we went hiking up the Alps, and I'm still surprised that I made it alive at the end of a 3 hour ordeal. After years of not having any physical exercise of any form, this was indeed rather taxing, but I guess all the muscle strain was definitely worth the experience, the views, and most importantly, the satisfaction that followed taking a hard path up a 1.708km high mountain for your first ever hike. :) The latter half of the holiday involved a visit to old city of Munich. Culturally rich and with a Bavarian feel all over, it was a fun experience to walk through the city streets. One thing I found particularly funny was that the Maximilianeum building, which houses the Bavarian parliament, had no gates nor security guards on the outside, which is in absolute contrast to equivalent buildings back home in India, which can be aptly labelled mini fortresses. The last day of the visit was reserved for a trip around the Munich marketplace and of course, the one and only Allianz Arena, which I finally got the opportunity to visit as a long time Bayern Munich fan. :) As is always the case with my travels, cuisine formed an important component of the visit too. Thanks to Eva's help, I was able to compile a list of dishes/drinks I had: Apfelstrudel, Neuernberger Rostbratwuerste mit sauerkraut, Kaesspatzen, Leberkaes mit Kartoffelsalat und Spiegelei, Weisswuerst mit Suessem senf, Schweinsbraten mit Kartoffelknoedel, Schweinshaxen, Brezeln, Radler, Weissbier and Weizen. I particularly loved the German variety as far as beers are concerned. Radler from an altitude of 1.708km tastes heavenly by the way. I would have loved to stay a little longer, but as the saying goes, all good things must come to an end. Now back in Lisbon, I'm finding it a little hard to shake off the vacation mood and get back to my projects, but I think I'll pull that off sooner or later.