• Questions to ask before conducting a performance evaluation

    I've been evaluating some popular cloud systems lately and unsurprisingly enough, I'm finding it really hard to make a final call on a set of results. Have I tuned the systems I'm comparing appropriately for the workloads I'm subjecting them to? Are the workloads reasonable? Is the system itself the bottleneck or is it a set of external factors? There were so many questions piling up that it's made me rather skeptical about system evaluations I find in papers today.

    So I've picked up this book, and found this minimal checklist to go through to avoid common mistakes when conducting a performance evaluation.

    1. Is the system correctly defined and the goals clearly stated? 
    2. Are the goals stated in an unbiased manner? 
    3. Have all the steps of the analysis followed systematically? 
    4. Is the problem clearly understood before analyzing it? 
    5. Are the performance metrics relevant for this problem? 
    6. Is the workload correct for this problem? 
    7. Is the evaluation technique appropriate? 
    8. Is the list of parameters that affect performance complete? 
    9. Have all parameters that affect performance been chosen as factors to be varied? 
    10. Is the experimental design efficient in terms of time and results? 
    11. Is the level of detail proper? 
    12. Is the measured data presented with analysis and interpretation? 
    13. Is the analysis statistically correct? 
    14. Has the sensitivity analysis been done? 
    15. Would errors in the input cause an insignificant change in the results? 
    16. Have the outliers in the input or output been treated properly? 
    17. Have the future changes in the system and workload been modeled? 
    18. Has the variance of input been taken into account? 
    19. Has the variance of the results been analyzed? 
    20. Is the analysis easy to explain? 
    21. Is the presentation style suitable for its audience? 
    22. Have the results been presented graphically as much as possible? 
    23. Are the assumptions and limitations of the analysis clearly documented?
  • Afraid of code? Don't be

    A student approached me over email recently, wondering if I was proposing any master thesis projects he could work on. In his mail, he'd left a disclaimer that he wasn't a good programmer. I assumed it was one of those cases where the student was indeed capable, but decided to be modest about his abilities.

    When we met, I walked him through the problems I was working on and explained a handful of potential master thesis topics I'd sliced out of this space that he could give a shot.

    Even though he had no prior experience in distributed systems or the cloud computing arena, I felt he was able to comprehend me quite well, asking rather intelligent questions along the way. Around 15 minutes into our discussion, he asked me what it would entail to go about these projects.

    I answered with something along the lines of: "Profiling a bunch of systems, and of course, reading and modifying the source code of at least one of them."

    The student seemed rather discomforted by my response. He immediately reiterated  that he's not a very good programmer, and that if the project involved a lot of coding, he won't be able to deliver on time. At this point, I pulled out my Obi-Wan hat and explained to him that being comfortable with programming is inevitable, especially considering the fact that he will be awarded a masters degree in Computer Science once he completes his thesis. I could tell that he wanted to be good at this, but was afraid of not being able to match up to the challenge.

    This got me thinking. Haven't we all been here before? What did we do to overcome our fear of code?

    So here's my personal take on overcoming this fear.

    You need to start somewhere

    Cliché yes? Yet, it seems to be lost on a lot of students.

    See your first major programming task, be it a thesis project, a course project, your first job, or even a self-planned weekend project, as a rite of passage.

    Writing software is an art. And as with any kind of art form, it takes sweat and discipline to be good at it.

    There's only so much you can learn through reading books, going through tutorials and solving exercises that will serve as a substitute for actual experience. More often than not in your career, you'll end up having to read, modify, or maintain code that others have written. The sooner you're exposed to the innards of dealing with such a situation, the better. So start somewhere!

    The first time's the hardest

    Let's say your project involves building atop an existing code-base (open-source or otherwise). How do you tame the beast?

    It's often as simple as:

    • Acquiring the pre-requisite domain knowledge
    • Identifying the exact components you need to deal with
    • Hacking away

    Acquiring necessary domain knowledge

    Of course, before you can hack on a piece of code, you need to have some sense of what the code should be doing. It's practically impossible to dive into a large code-base without any context whatsoever and manage to understand what's going on. For instance, attempting to hack a Linux WiFi driver without having any clue of what the WiFi MAC/PHY functions are would be an exercise in masochism. If, however, you knew the concerned protocols, things will make more sense to you.

    And guess what? You already have or will have to build up this domain knowledge anyway, like when you're about to embark on a masters thesis.

    Identifying the components you need to deal with

    A mature open-source project is usually accompanied with really good developer documentation, which will have important information about the code's architecture and some hints about the the key code paths.

    Make a pass through the documentation and have a look at the source tree of the project. The directory hierarchy is usually quite revealing of the code structure. Also, grep through the source files for  keywords pertaining to the components you're looking for.

    Soon enough, you'll be able to identify the subset of source files you'll need to tamper with.

    Hacking away

    I usually begin by instrumenting the code with simple logging statements to get an idea of what's happening under the hood. Now start introducing your modifications little by little. Re-run the code to see if your modifications work. If possible, use a debugger and stack traces to understand what code paths are being taken internally. Rinse and repeat till you're done.

    With enough time and effort, you should eventually tame the code.

    Stand on the shoulders of giants

    Key to any form of learning is to make diligent use of the experience of those who've traveled the same roads before you. In the programming world, reading code that others have written is an easy way to accomplish this. Better yet, try to contribute patches and have the experts themselves review your code for you. Bear in mind that you will make mistakes, but that's part of the journey.

    In my case, my first non-trivial code patch to an open-source project was this one. Notice how many mistakes I'm making along the way, and how many iterations it took before the patch was finally committed (and how patient the maintainers were!).

    Wrapping up

    The path to become a better developer is exciting, deeply empowering, and incredibly humbling.

    Of course, to experience all of that, it's imperative that you gulp away all that self-doubt and embark on the path first. So don't fear code.

    In Soviet Russia, code fears you!

  • Divergence of Identity

    Given that I've spent many more years of my life being an expat than a local, this article struck a deep chord with me.

    ...So you look at your life, and the two countries that hold it, and realize that you are now two distinct people. As much as your countries represent and fulfill different parts of you and what you enjoy about life, as much as you have formed unbreakable bonds with people you love in both places, as much as you feel truly at home in either one, so you are divided in two. For the rest of your life, or at least it feels this way, you will spend your time in one naggingly longing for the other, and waiting until you can get back for at least a few weeks and dive back into the person you were back there. It takes so much to carve out a new life for yourself somewhere new, and it can’t die simply because you’ve moved over a few time zones. The people that took you into their country and became your new family, they aren’t going to mean any less to you when you’re far away.

    When you live abroad, you realize that, no matter where you are, you will always be an ex-pat...

  • Pareto Optimality and Computer Science

    I've been looking at resource allocation problems within data-centers lately. I think it's unlikely that someone working in this area does not come across the concept of Pareto optimality, so I thought I'd finally clear the cobwebs off this blog by writing about this.

    Pareto optimality is an economics concept developed by Vilfredo Pareto that has important applications in computer science.

    "You're welcome, computer nerds" - Vilfredo Pareto

    Simply said, in a Pareto optimal allocation of resources between multiple parties, no single party can be made better off without making at least another party worse off.

    Here's an example: if we have a pie and we divide it equally between Alice and Bob, it's impossible to improve the share for either one of them without negatively impacting the other. So if we divide the pie into two equal halves and give a half each to Alice and Bob, this is a Pareto optimal outcome.

    So is Pareto optimality always desirable? Heck no.

    This is because Pareto optimality doesn't tell you anything about fairness, which is also a very desirable property when it comes to resource allocation. In our Alice and Bob example, consider an allocation where the entire pie is given to Alice and Bob doesn't get any. Is it fair to Bob? Of course it isn't. Is it Pareto optimal? Yes, because you cannot increase Bob's share of the pie without taking an equivalent share away from Alice.

    So what exactly is optimal about a Pareto efficient allocation? To put it naively, it is an allocation where utilisation of available resources is the focus. This is quite natural considering the fact that the concept was designed within economic theory because in an economic system, it is not desirable that supply is under-utilised when there are still demands to be met.

    In most practical situations involving computer systems, resources are allocated incrementally between different individuals (as opposed to a one time allocation). As an example, in an IaaS setup like Amazon EC2, the system needs to provision resources on a continuous basis by keeping up with incoming demands by different tenants. In such a scenario, you're matching a pool of resources (like compute cores, disk, etc.) to an incoming demand vector from a tenant on the fly. Now if this tenant's demand is met such that all other tenants' allocations are at least as good as they were beforethen such an allocation is considered a Pareto improvement.

    Like the notion of Pareto optimality, Pareto improvements are also slightly weak. For instance, making a few allocations that aren't Pareto improvements as of now may allow the system to reach a much more desirable situation later.

    Going back to our pie example, assume we're in a scenario where Alice has been allocated the entire pie. Assuming the pie is big enough, it's unlikely that Alice would be able to eat the entire pie in one sitting without falling sick. Now if we take all that extra pie from Alice and give it to Bob, it isn't a Pareto improvement because Bob is now better off at the immediate cost of Alice, but Alice is eventually better off by not falling sick. This is now a much more desirable outcome, and is also Pareto optimal. Similarly, your operating system's scheduler may have to slow down your massive compilation process if you want to be able to watch a youtube video at the same time.

    The set of Pareto optimal choices in a system is called the Pareto frontier. Given a finite set of data-points, one can compute the Pareto frontier through skyline query algorithms.

    I'll stop here for now because my stomach's expressing its disgruntlement at being denied food for a longer period than usual. In a future post, I'll talk about Pareto optimality and its relation to Nash Equilibria.

  • Papers papers everywhere

    Last month, I officially began my PhD research. Not surprisingly, the difference from any prior research I've ever done is quite evident. Primarily because there is really no horizon at all when you embark on a PhD.

    The difficulty here essentially boils down to finding an answer to the following question: What problem is worth spending the next five years of your life trying to solve?

    I already have an idea of the broad area that I'd like to look into. Digging deeper and deciding which boundary to push at, however, is the tricky part.

    The workflow here is as follows:

    1) Read a paper. Creativity and optimism will immediately kick in, pointing you to potential opportunities and gaps, causing a rainbow of ideas to explode out of your brain. You shiver with excitement.

    2) Read another paper only to realise someone else has already done all of that and a little more.

    3) Repeat.

    Let's see how long it takes before I finally break from step 1.

    I'm not sure how I should end this post, so here's a picture of my desk.

    Many trees have died for the greater good here