• Comic: Utility

     

    2. YANC - Utility

  • Comic: A bad networking researcher...

    I always wanted to do comics, so here's a hello world:

    1. YANC - A bad networking researcher...

  • Students aren't cattle, they're humans

    Recently, a teacher at my alma mater turned barber and forcibly cropped the hair of over 90 students who didn't follow the school's code of conduct for haircuts (we didn't even know there was one during our days). Obviously, the students, parents and locals weren't amused by this barb-er-ic act. Neither were those of us who are proud of being former students of that school.

    Indian educational institutions have always been rather conservative in their organisation. There is all too often a strict hierarchy visible in the way things work. Teachers have a commanding authority over students, and it's common for students to stand up and say "good morning/afternoon/whatever" in unison when the former enters a classroom. Failure to do so is often seen as an act of disrespect. In many schools, even at a high school level, you find that students are expected to form and walk only in a line when moving from one classroom to another. I've seen institutions with really silly dress codes, like "boys should only wear shirts/t-shirts that *have a collar*". There are many colleges where cell phones are banned on campus (and many that go too far to implement their policy). And there are also colleges where guys aren't allowed to talk to girls.

    There is an overflow of rules, and unnecessary requirements on conformance. I've had a lecturer yell at me for 15 minutes (out of a 40 minute lecture), ask me to never enter class again, and walk out of class herself in anger simply because I didn't "write down" the last few steps of a solution to a problem she handed out in class, which I managed to solve in my head to arrive at the answer.

    All the above is absolutely disgusting and is an antithesis to what is supposed to be education. We put a bunch of students into a classroom, expect all of them to conform to a strict set of rules, and punish the ones who don't with expulsion. Ultimately, all these students end up having to take charge of society in some way or the other, and we basically leave our country in the hands of those who abide by silly dogmas, are well trained in the art of superficial respect, can't think laterally, are used to hierarchies, and with regard to some of the specific cases I mentioned above, are incapable of working with the opposite sex.

    I wonder what the net gain is from putting so much effort into running a system like this?

    Our future leaders should be capable of making decisions that affect others positively. We need students to be well trained in networking with others, exchanging ideas, and communicating effectively. We need them to be open minded, embrace differences, and adapt to the pace at which the world around us is evolving. How on earth is all that supposed to happen if *this* is their education?

    Furthermore, where do teachers get the time and energy to enforce such rules when they have so many important things to attend to?

    Teachers have the responsibility of being a role model and not that of a dictator. All those teachers who'd inspired me over my life _strictly_ fall into the former category (from junior school, through high school, and upto where I am now). They were the ones who invested enough effort into figuring out how best to convey their ideas to their students, learned how to tap into our creative potential, and at the same time, stayed up to date with whatever it is that they were supposed to teach. They also served as moral and social instructors not through an iron fist, but through inspiration. They gave us enough room to develop, spared us the fury when we made mistakes and instead, taught us how to learn from them. They made us go "There is so much I can learn from that person!" as opposed to "That person will screw me over if I don't do this".

    Sure, perfect pedagogy isn't easy, but chaining students to a gratuitous set of rules is definitely not the answer. They are humans after all.

  • Magic Numbers in Distributed Systems

    I'm officially done with the first half of my masters as of now, and it's been a fun swim so far in the violent sea of distributed systems, which by the way, is a really big zoo where anything can go wrong. To quote Leslie Lamport:

    A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable"

    Thus, researchers over the last few decades have gone through great lengths to design distributed algorithms that ensure correct behaviour in the light of node failures and random activity by other processes. But if you're a student and you're going through these algorithms, it's not entirely obvious what some of the magic numbers mean at times, and what the intuition is behind them that ensures that an algorithm works correctly. This post is meant to help in that direction. As per distributed systems convention, N refers to the number of processes in the system and F refers to the number of failures that can be tolerated.

    • \(N/2\)

      This is a simple majority "quorum". There are many algorithms that exploit the fact that two quorums overlap in at least one process. Let's take an example to illustrate this. Consider a system with N processes, serving as a distributed replicated database. There is a single process writing to the database and there can be more than one reader. One approach to doing this would be to have the writing process write to at least \(\lceil N/2\rceil + 1\) replicas. If the reading process reads from at least \(\lceil N/2\rceil + 1\) nodes, there will be at least one replica which has seen the latest write. This can used in scenarios where only less than half the nodes can fail. Some examples include Majority ACK Uniform Reliable Broadcast, Majority Voting Regular Register, and Read Impose Write Majority (1,N) Atomic Registers.

    • \((N+F)/2\)

      Referred to as "Byzantine Quorum", this one's a bread and butter number for many Byzantine Fault Tolerant algorithms, which are algorithms that tolerate arbitrary failures. This means that the distributed algorithm produces a correct result in spite of having processes that either do not respond, send out garbage values and so forth. The behaviour can be either due to faulty hardware, software, or due to control by a malicious user. The essence of the number itself is that any two sets of size \(\lceil(N+F)/2\rceil + 1\) overlap in at least one correct process. This is valid only when \(N > 3F\) (you can learn the "why" behind this from the Byzantine Generals Problem). Example algorithms: Authenticated Data Byzantine Quorum based Regular Register and Byzantine Randomised Consensus.

    • \( > (N+2F)/2\)

      This is known as a Byzantine Masking Quorum. The intuition here is that any two quorums of size \(\lceil (N+2F)/2 \rceil+ 1\) will overlap in \(2F + 1\) processes, thus ensuring that you have a worst case split up of \(F+1\) correct processes and \(F\) faulty ones. Note that \(N > 4F\) is necessary for this condition to be true. This is particularly useful in the context of Byzantine Registers, and can be seen in algorithms that implement Byzantine Safe Registers. Let's take the example of our distributed replicated database that we'd discussed above, but with byzantine faults.  The writing node attempts to write to all nodes and returns from the write upon receiving ACKs from \(\lceil (N+2F)/2\rceil + 1\) replicas. A node that needs to read from the replicated servers reads from any \(\lceil (N+2F)/2\rceil + 1\) nodes, and only picks a value if it has more than \(F\) occurrences in the set that was read (the Byzantine process can return garbage values upon the read request). The \(\lceil (N+2F)/2\rceil + 1\) guarantees that there would be at least \(F+1\) correct processes in such a set so that reads that are not concurrent with a write will for sure give you \(F+1\) correct and up-to-date values.

    • \(F\)

      This one is pretty straightforward, and you begin seeing this as soon as you're dealing with algorithms wherein processes crash. If you receive a message from at least \(F+1\) processes, then you're sure that you've received the message from at least one correct process (a process that will not fail).

    • \(2F\)

      Follows from the previous one, this ensures that if you receive \(2F + 1\) messages, you're sure that more than one correct processes is part of this set. This is particularly seen in Byzantine Algorithms where making a decision based on \(> F\) is risky. For instance, consider situations where uniform agreement is required. One such example is that of Leader Election in a distributed system. For correct functioning of the system, it is entirely necessary that all nodes have the same view of who the leader is. The algorithm fails if two nodes have picked two different nodes as their respective leaders, and in the presence of byzantine faults, this is quite possible. Consider the Rotating Byzantine Leader Detection algorithm wherein nodes can choose to "complain" about the current leader under the suspicion of being byzantine. A node decides that it should shift to the next leader when it hears at least \(> 2F\) complaints. Here's why we can't use \(> F\) alone over here. Let's assume our system should tolerate one byzantine fault \((F = 1)\). Assume, that a process P incorrectly suspects the current leader (who is legitimate) of being byzantine and broadcasts a "complaint" message. Now a byzantine node in the same system can go against the protocol, and send process P a "complaint" message, at which point process P now has \(> F (2 > 1)\) complaints about the current leader, causing it to change leaders. Since the byzantine process sent the complaint message only to process P and not the other processes in the system (who didn't find anything strange about the leader in order to complain), we now have process P with a leader \(L_{new}\) and the remaining processes at leader \(L_{old}\) such that \(L_{new} != L_{old}\). Byzantine process wins. On the other hand, if you rely on the \(2F + 1\) magic number, you're sure that there are at least \(F+1\) correct entries in your set.

    • \(N-F\)

      In algorithms where faulty nodes have the liberty of "not responding" (crash-stop processes for instance), \(N-F\) is the lowest number of responses you can hope to get if you ping all processes in the system. In byzantine scenarios, a set of \(N-F\) processes contains at least \(N-2F\) correct processes (\(F\) responses out of \(N-F\) can be corrupt). This number can be seen in many algorithms which require uniform agreement, one example being Byzantine Randomized Consensus.

    That's it for now.

  • HOWTO: Getting started with ns-3-click – Part II

    We now move into part II of the ns-3-click tutorial series wherein I'll walk you through an ns-3 script and explain how to load your Click scripts onto ns-3 nodes. This tutorial goes through the following steps:

    1. Walkthrough of an example Click graph from ns-3-click.
    2. Using the Click graph in an ns-3 simulation.

     

    Walkthrough of an example Click graph from ns-3-click

    The Click script I'll be explaining below is available in current ns-3-dev and stable releases from ns-3.11 onwards. You can find it in src/click/examples/nsclick-lan-single-interface.click.

    When working with ns-3-click, you will need to handle all layer 3 functionalities expected of a networked node from Click. The Click script that we'll be talking about here provides exactly this. It handles ARP, and forwards packets up and down the stack as required. It has a single network interface (eth0) to send and receive packets, and has a kernel interface (tap0) to send/receive packets from the kernel (in ns-3, this corresponds to communicating with layer 4).

    We first describe the interface to tap0 with the following functionalities:

    1. All packets received from tap0 should be forwarded down to the stack.
    2. All packets received from below the stack, destined to us, should be sent up the stack via tap0.

    [sourcecode]

    elementclass TapSimHost {

    $dev |

    // Packets go to tap0, which sends them to the kernel

    input[0]

    -> ToDump(tokernel.pcap,2000,IP,PER_NODE 1)

    -> ToSimDevice($dev,IP);

    // Packets sent out by the kernel get pushed outside

    FromSimDevice($dev,4096)

    -> CheckIPHeader2

    -> ToDump(fromkernel.pcap,2000,IP,PER_NODE 1)

    -> GetIPAddress(16)

    -> [0]output;

    }

    kernel::TapSimHost(tap0);[/sourcecode]

     

    The above snippet does exactly what we've described so far. We create an element class which has a single input and output. Packets received on the input are plumbed to tap0 (because $dev is now tap0, as per the instantiation in the last line). Packets received from tap0 are pushed outside. We now describe a LAN host, which will handle ARP, and check the destination IP to see if we should receive the packet.

    [sourcecode]

    elementclass LanSimHost {

    $ipaddr, $hwaddr |

    cl::Classifier(12/0806 20/0001,12/0806 20/0002, -);

    forhost::IPClassifier(dst host $ipaddr,-);

    arpquerier::ARPQuerier(eth0);

    arpresponder::ARPResponder(eth0);

    ethout::Queue

    -> ToDump(out_eth0.pcap,PER_NODE 1)

    -> ToSimDevice(eth0);

    // All packets received on eth0 are silently

    // dropped if they are destined for another location

    FromSimDevice(eth0,4096)

    -> ToDump(in_eth0.pcap,PER_NODE 1,ENCAP ETHER)

    -> cl;

    // ARP queries from other nodes go to the ARP responder element

    cl[0] -> arpresponder;

    // ARP responses go to our ARP query element

    cl[1] -> [1]arpquerier;

    // All other packets get checked whether they are meant for us

    cl[2]

    -> Strip(14)

    -> CheckIPHeader2

    -> MarkIPHeader

    -> GetIPAddress(16) // Sets destination IP address annotation

    -> forhost;

    // Packets for us are pushed outside

    forhost[0]

    -> [0]output;

    // Packets for other folks or broadcast

    // packets get sent to output 1

    forhost[1]

    -> ToDump(discard.pcap,2000,PER_NODE 1,ENCAP IP)

    -> [1]output;

    // Incoming packets get pushed into the ARP query module

    input[0]

    -> arpquerier;

    // Both the ARP query and response modules send data out to

    // the simulated network device, eth0.

    arpquerier

    -> ToDump(out_arpquery.pcap,PER_NODE 1)

    -> ethout;

    arpresponder

    -> ToDump(out_arprespond.pcap,PER_NODE 1)

    -> ethout;

    }

    lan::LanSimHost(eth0:ip,eth0:eth);

    [/sourcecode]

     

    Now that we have a LanSimHost type ready, and instantiated (in the last line indicated above), we perform the final plumbing required to connect our LanSimHost to our kernel tap device:

    [sourcecode]

    // Users can do some processing between the two elements

    lan[0] -> kernel;

    kernel -> lan;

    // Packets for others or broadcasts are discarded

    lan[1] -> Discard;

    [/sourcecode]

    This concludes our description of the nsclick-lan-single-interface.click file. Let's now describe an ns-3 simulation script for our scenario.

     

    Using the Click graph in an ns-3 script

    I'll now describe a simple ns-3 script which makes use of the above described Click graph. Not surprisingly, the script is named nsclick-simple-lan.cc and can be found within src/click/examples/. The simulation scenario is a simple one: two nodes A and B connected via a CSMA channel, with A sending B a stream of packets using a TCP connection. A is Click based, whereas B is a normal ns-3 node.

    The first step would be to create the nodes.

    [sourcecode language="cpp"]

    NodeContainer csmaNodes;

    csmaNodes.Create (2);

    [/sourcecode]

    Next, we create a CSMA channel.

    [sourcecode language="cpp"]

    CsmaHelper csma;

    csma.SetChannelAttribute ("DataRate", DataRateValue (DataRate (5000000)));

    csma.SetChannelAttribute ("Delay", TimeValue (MilliSeconds (2)));

    NetDeviceContainer csmaDevices = csma.Install (csmaNodes);

    [/sourcecode]

    We then install a normal internet stack on node B.

    [sourcecode language="cpp"]

    InternetStackHelper internet;

    internet.Install (csmaNodes.Get (1));

    [/sourcecode]

    And then setup a Click based internet stack on node A. We need to specify the Click script that the particular node is supposed to use, and in the event that we require a Click based node to run a traffic generator on top, we need to specify a routing table element for the node to use. This can be seen by the name "rt" at the end of the nsclick-lan-single-interface.click file.

    [sourcecode language="cpp"]

    ClickInternetStackHelper clickinternet;

    clickinternet.SetClickFile (csmaNodes.Get (0), "src/click/examples/nsclick-lan-single-interface.click");

    clickinternet.SetRoutingTableElement (csmaNodes.Get (0), "rt");

    clickinternet.Install (csmaNodes.Get (0));

    [/sourcecode]

    Now that all the nodes have an internet stack, we assign IPv4 addresses to all the network interfaces.

    [sourcecode language="cpp"]

    Ipv4AddressHelper ipv4;

    ipv4.SetBase ("172.16.1.0", "255.255.255.0");

    ipv4.Assign (csmaDevices);

    [/sourcecode]

    And then, we setup the traffic generators for talking between node A and node B.

    [sourcecode language="cpp"]

    Address LocalAddress (InetSocketAddress (Ipv4Address::GetAny (), 50000));

    PacketSinkHelper packetSinkHelper ("ns3::TcpSocketFactory", LocalAddress);

    ApplicationContainer recvapp = packetSinkHelper.Install (csmaNodes.Get (1));

    recvapp.Start (Seconds (5.0));

    recvapp.Stop (Seconds (10.0));

    OnOffHelper onOffHelper ("ns3::TcpSocketFactory", Address ());

    onOffHelper.SetAttribute ("OnTime", RandomVariableValue (ConstantVariable (1)));

    onOffHelper.SetAttribute ("OffTime", RandomVariableValue (ConstantVariable (0)));

    ApplicationContainer appcont;

    addressValue remoteAddress (InetSocketAddress (Ipv4Address ("172.16.1.2"), 50000));

    onOffHelper.SetAttribute ("Remote", remoteAddress);

    appcont.Add (onOffHelper.Install (csmaNodes.Get (0)));

    appcont.Start (Seconds (5.0));

    appcont.Stop (Seconds (10.0));

    [/sourcecode]

    Lastly, we enable PCAP tracing on all the CSMA NetDevices in the scenario.

    [sourcecode language="cpp"]

    csma.EnablePcap ("nsclick-simple-lan", csmaDevices, false);

    [/sourcecode]

    And to conclude the script, we specify the running time for the simulation to be 20 seconds, and call Simulator::Run(). Don't forget to call Simulator::Destroy() lest tools like Valgrind start screaming about memory leaks.

    [sourcecode language="cpp"]

    Simulator::Stop (Seconds (20.0));

    Simulator::Run ();

    Simulator::Destroy ();

    return 0;

    [/sourcecode]

     

    To see the results of running our simulation, execute the below in your terminal from the ns-3-dev top level directory once you've built Click as described in the previous article.

    $: ./waf --run nsclick-simple-lan

    Have a look at the resulting pcap traces (nsclick-simple-lan-0-[0,1].pcap) using wireshark or tcpdump to see what happened through the simulation.

    Hope you found this little walkthrough helpful. If you find any bugs with ns-3-click, please don't hesitate to file a bug report on our bugzilla. :)