Banishing Bufferbloat

After reading this thought-provoking article about bufferbloat, I wanted to do two things: have a better understanding of the concept, and find evidence of its occurrence within my own set-up.

The term ‘bufferbloat’ was coined by Jim Gettys in 2010 as an explanation of much of today’s internet congestion, which can lead to very poor performance over an apparently “high bandwidth” network connection.

In this article I will attempt to explain bufferbloat in a way accessible to those who are not network professionals.

Disclaimer: I am not a network professional either; I simply enjoy researching things.  This article is purely an attempt to digest what I’ve learned, and hopefully pass on something interesting to others.

I will also document how I solved one particular instance of the problem in my own network (probably only of interest to those who enjoy performing fairly invasive software surgery).

The internet and indeed, any system of connected components – is made up of communication channels, each capable of a particular throughput. This can be visualised as a network of interconnected pipes, all of varying widths.

Any point where a “large pipe” (high bandwidth) feeds into a smaller one (low bandwidth) can become a bottleneck when traffic levels are high. To be clear, the situation of interconnected links with varying bandwidths is normal – for example where a backbone link carrying national traffic feeds into a smaller network servicing a particular set of subscribers. Usually the subset of traffic coming through the bottleneck would not usually exceed that which the small pipe can service, otherwise the situation would clearly be inadequate.

However, temporary spikes in traffic during unusually busy periods can occur. At this point, one of two things can happen. Either the excess traffic is stored up in a buffer (a “holding area”) for the duration of the spike, or else the narrower link must reject the excess traffic as there’s nowhere for it to go.

In the first scenario, the excess traffic would slowly fill the buffer for the duration of the spike. The buffer would be drained into the smaller pipe as fast as can be supported. Once traffic levels return to normal, the buffer would empty back to its normal level. The upstream components would not be aware of this situation, as they would not experience any rejected traffic (dropped packets in networking parlance).

If the traffic spike is prolonged however, then the buffer becomes full, and the situation is similar to that where no buffer exists: packets are dropped. From the upstream producer’s point of view, the packet would need to be resent (as no acknowledgement was received). The re-sending process would continue whilst the bottleneck is in effect, and would appear as a slow (or stalled) data transfer.

In the early days of the internet (c. 1986), buffers were insufficiently sized. This led to heavy packet loss during times of even moderate contention, to the point where most of the traffic was retransmitted packets. This was clearly inadequate, and so the use of larger buffers was recommended. Importantly, congestion control algorithms were also brought into play in each link which transmits data. These algorithms attempt to detect the size of the downstream pipe by slowly ramping up traffic to the point where no packets are dropped.

So where’s the problem? The problem surfaces when the size of buffers is set too high. A buffer is just an area of memory, and as memory has become cheap, buffers have become larger, without adequate consideration of the consequences.

A buffer which is too large gives a false indication that a bottlenecked pipe is bigger than it really is. If a very large buffer is in use, then your data transfer is simply filling this buffer, making the pipe look bigger than it really is. The buffer doesn’t even serve its original purpose, as it is permanently full.

Why is this bad? If you’re doing a large upload (for example, sending a video to YouTube or backing up music to cloud storage) where an oversized transmit buffer is present, then web pages may appear to load very slowly (many seconds). The reason is that the tail-end of the large upload is sat in a large queue. A request to Google would sit at the back of the queue, and would have to wait until the buffer is emptied before it is sent on to the next link.

The solution is to tune the size of the buffer, such that it is only used to absorb temporary spikes in traffic, rather than giving false indications of high bandwidth during periods of contention.

To be fair, the real solution is fairly complex, involving Active Queue Management to signal the onset of congestion so the rate of flow can be backed off before the buffer becomes full.

In many cases, these buffers exist in network equipment (such as routers) which is controlled by ISPs and similar organisations, but there are places under your own control where you can identify and fix this phenomenon.

For my own situation, the issue was that during a large backup of files from my netbook to another computer on my network, it was virtually impossible to do anything else network-related on the netbook.

During a large file upload to another computer on my LAN, a very slow wireless connection is a permanent bottleneck, with an observed effective throughput of 400kB/s (shown by the scp command), or 3Mbps.

By default, Linux allocates a transmit buffer maximum size of about 3MB (obtained via the following command, which gives minimum, default and maximum memory for the TCP transmit buffer):

sysctl -a | grep net.ipv4.tcp_wmem

If I start off a large upload and watch the size of this transmit buffer, the tx_queue settles at around 1.7MB. This value was obtained via:

cat /proc/net/tcp

1.7MB of data was permanently sat in the buffer; this would take around 4 seconds to drain over a 400kB/s network link. So any requests for web pages whilst the transfer is going on will be sat in a 4 second queue. Not good.

This setting certainly needed to be tweaked in my case. Setting it too low however would result in small windows of data being sent per-roundtrip, which would prevent TCP from ever ramping up to full throughput.

The article quoted earlier suggests the recommended buffer size is the Bandwidth Delay Product. This is the bottleneck bandwidth, multiplied by the delay (or latency) that packets in the buffer take to reach their destination. So, my buffer size of 1.7MB with a latency of 1ms (over my home network) correlates to an imaginary bandwidth of 1.7MB/s, or around 14Mbps (in contrast to the real bottleneck bandwidth of around 3Mbps). So, the TCP transmit buffer was five times too large for my particular environment.

Setting the TCP transmit buffer size to the approximately correct size of around 256Kb mostly fixed the problem. I settled for a figure of 128Kb – on my system this is a good compromise between bandwidth for large uploads, and latency for other interactive activity such as browsing or SSHing. This setting can be changed by editing sysctl (the interface into kernel parameters which can be modified at run-time) as follows:

sudo vi /etc/sysctl.conf

Follow this with a refresh of the parameters, and you’re done:

sudo sysctl -p

Caveat: Your own mileage certainly may vary if you choose to tweak these settings. You’d be mad to do this on anything important without knowing exactly what you’re doing.

Note: There are a number of articles which suggest increasing the size of the network buffers in Linux, using a similar approach.  Based on my understanding and experiences, this is fine if raw bandwidth is your goal, and particularly if you have a healthy upstream bandwidth.  If you don’t have this bandwidth, then setting these buffers too high could harm your interactive network activity, while being unable to improve utilisation in an already saturated link.

 

An Appetite for Combinatorics

It’s common to see “find the number of possibilities” problems in Computer Science.

This kind of problem stems from Discrete Maths – an important pre-requisite for doing anything beyond the trivial, for example Cryptography or Graph Theory.

I found one of these problems on Project Euler.  Project Euler is a collection of mathematically-inclined programming problems – probably more than you could ever solve in a lifetime (some of them are still unsolved by anybody).

The particular problem which drew my attention doesn’t actually require any programming to solve.  It goes like this:

Starting in the top left corner of a 2×2 grid, there are 6 routes (without backtracking) to the bottom right corner.


How many routes are there through a 20×20 grid?

This is pretty fundamental Maths, but I find these kind of techniques are always worth re-visiting, as it seems to be a case of “use it or lose it”.

Following is my approach, so don’t read any further if you want to try it yourself first!

I started by drawing a tree structure for the 2×2 grid, where each node had two choices = ‘R’ or ‘D’ (for go Right, or Down).  This gave me a feel for things.  Towards the end of some paths, there was clearly some pruning - where the only option is to head for the goal (rather than back-tracking or going out of bounds).

It then became clear that any plan for getting to the goal simply involved two Rs and two Ds.  You clearly need to take two steps Right, and two steps Down to reach the goal, whatever your route.  So the problem can be re-stated as “how many ways are there of arranging two Rs and two Ds?”  Or more vividly: “If I have a bag containing two Kit-Kats and two Mars Bars, how many distinct ways can I eat them in sequence?”

Of course, the stated problem involves twenty each of Kit-Kats and Mars Bars.  So if I was really hungry, how many ways could I eat them all?

Suitably motivated, it’s time for some fun with combinatorics.  For the moment, let’s go back to the 2×2 grid, and ignore the repetition of Right and Down moves.  This means we must take four distinct steps to reach the goal.  So let’s assume that we have a bag of four chocolate bars – all different.  How many ways can we draw them in sequence?  Or more properly, how many permutations are there?

For the first choice, we have four options.  Once we’ve made this first selection, we have three left to choose from.  Then two, and finally there’s only one left.  This naturally leads us to the factorial function:

4! = 4 x 3 x 2 x 1 = 24

So there are 24 ways (permutations) to draw four tasty, chocolate treats.

Now let’s amend our calculation, taking into account that two of the chocolate bars are identical.  Say, two Milky Ways, one Kit-Kat, and one Mars Bar.

This is easy to work out – out of our 24 original permutations, we need to omit the repeated permutations of the two identical items.  There are 2! (2 x 1 = 2) ways to arrange two chocolate bars, so we adjust our answer for this.

4!/2! = 12 permutations

Now, it’s only one more step to re-discover the example solution, by taking into account that there are two classes of two identical ‘objects’ (Right moves and Down moves), and so we end up with:

4!/(2!*2!) = 6 permutations.

Now it’s really easy to solve the stated problem – I won’t give away the solution of course!

Clueless? Just Improvise.

“Go on then, make us laugh.”

Those words would probably make anybody curl up and die.  Many of us would freeze up, go into our heads and try to think of a clever joke.  We often think being funny is a special skill reserved for ingenious stand-up comedians, or witty wordsmiths like Stephen Fry.

Not so for an Improv Comedian – they’d be more likely to do the first zany thing that comes into their head.  And hey presto, it’ll probably be funny.

Unlike stand-up, Improv Comedy involves short scenes made up on the fly, often with instructions from off-stage to change scenes or characters mid-flight.  When it’s impossible to plan ahead, spontaneity and total participation rule over being clever or witty.

Improv is often funny because of the eccentric and unexpected performances that happen in the heat of the moment.  As an audience, we can’t help but laugh in relief or recognition, as the performers first appear to be in dire straits, but then dredge up a convincing scene seemingly from nothing.

Can you learn this stuff?  Apparently you can – over the last couple of weeks I’ve been taking classes with Steve Roe of Hoopla.  His workshops attract everybody from rigid newbies (such as myself), to experienced actors and bona-fide theatre types.

As a software engineer and logic junkie, spontaneity feels like a great skill to unlock within myself.  Public speaking groups such as Toastmasters have helped me to feel comfortable speaking to a group, but speaking off-the-cuff requires a different bag of tricks.  Improv has taught me that convincing scenes often develop out of thin air, as long as the group is totally present and heading in the same direction.

The Hoopla workshops usually start with simple, fun warm-ups to build a safe, supportive atmosphere.  Next come specific skill-building exercises, where we take turns to act out scenes in small groups.

There are many techniques, but one of the most fundamental is called “Yes, and”.  If my partner tells me that “This is the best biscuit I’ve ever tasted!” and I reply with “What biscuit?”, then I’d be denying their contribution.  With the “Yes, and” mentality I might reply “Yes and that’s the last one, you greedy pig!”  When there’s an agreed reality, the scene gains traction.

Mike Myers successfully “yes ands” a mischievous James Lipton in this exchange:

JL: “Ants and caterpillars can be – in certain circumstances – delicious.”
MM: “Yes, and I had them yesterday.”
JL: “You had them yesterday?  Here’s a strange coincidence – so did I.”
MM: “Yes I know, because I was across the street watching you.”
JL: “It’s very odd because I was eating in my bathroom.”
MM: “Yes, and I was in the medicine cabinet.”

The “Yes, and” technique is a way to avoid mistakes.  But even when mistakes do occur, Improv performers take them in their stride.  In fact, “mistakes” don’t even exist in Improv – they are simply “an offer which hasn’t yet been acknowledged”.  These offers turn into a “game” between the performers which is much more fun than a straight scene.

During one workshop, we formed pairs, and acted a straight scene such as a job interview.  As soon as the first mistake happened – for example, when someone says something inconsistent – we’d stop.  We’d acknowledge the mistake by turning it into a “game” occurring within the scene.  For example, the “what biscuit?” mistake from earlier could have gone another way:

Mary: “This is the best biscuit I’ve ever tasted!”
John: “What biscuit?”
Mary: (Unperturbed) “It doesn’t go so well with this tea though.  I prefer Digestives for dunking.”
John: (Continuing the game) “Tea? Where? What are you talking about?”
Mary: “I think you should try some.  Here let me pour some into your cup – ”
John: “What cup? – whoa!” (Mimes being scalded by boiling water)

The “game” is that John denies the existence of anything Mary says.  Mary uses John’s mistake as an offer, and eventually “traps” him.  Mistakes like this are a fantastic way to generate material.  Pre-planned, logical thinking would never have arrived at the same result.

These two techniques only scratch the surface.  Each idea we learn feels like a rediscovery of Things That Already Work – in everyday life as well as on the stage.

In fact, learning improv has felt like an “unlearning” of sorts.  The creative, spontaneous part of the brain seems to work best when given space to work unimpeded.  Planning, preparation and self-criticism are thrown to the wind, and the result is fun and sometimes even hilarious.

Crouching Surcharge, Hidden Gazump

It’s hardly news that Ryanair have a “creative” approach to business.

They’re already trying to dodge the government crackdown on debit card processing fees, by relabelling their £6 surcharge as a “Processing Fee which depends on the type of card you pay with”.  To be specific, you’re only charged £6 if you use any brand of credit or debit card other than their own.

Now, imagine another scenario.  You’re at the supermarket, about to pay for your shopping.  To your surprise, a bunch of items you didn’t want have somehow appeared in your basket.  Confused, you re-trace your steps to find out where they came from.  Gradually, you discover a range of cunning tactics, requiring you to opt out of “default purchases” which the shop thinks you really ought to buy (strangely enough).

This is what happens when you buy tickets with Ryanair.  Take a look at the form below – see if you can figure out how to opt out of Travel Insurance.

To be fair, the drop-down menu has covered up the fairly small print, which in fact tells you how to opt out.  At first I didn’t spot it, because my attention was drawn to the check-box labelled “Add Travel Insurance PLUS” (which I’d diligently left unchecked).

If you *really* want to opt out, you have to expand the drop-down menu (of *countries*), and select the “Don’t Cover Me” option – handily placed between Latvia and Lithuania!

I can’t think of any logic behind this approach other than to mislead the customer, in the hope that they won’t notice the extra £7 on their bill.

The High Street Chill-Out Zone


Pssst – all you new-age vagrants out there.  Ever fancied the comfort of a cosy lounge for free, right on the High Street?

If you’re thinking “coffee shops”, then think again.  Unless you cherish hustling for a space amid used napkins and oozings of toffee-nut latte, to sit on plywood shaped like a rudimentary chair, whilst the din of industrial coffee grinders compete with the shrieks of spoiled toddlers … if you cherish that, then go right ahead.

For a more homely experience, rock up to your local Department Store.  These often have a furniture section, containing mock-ups of living rooms in various styles.  Simply turn up any time during opening hours, choose the sofa you like best, and make yourself at home.

Be sure to have everything you need before you arrive.  Newspaper, flask of coffee or soup, hot water bottle.  A pet (live or stuffed) makes a cute, cuddly addition (especially if still warm).

Check your phone battery is fully charged – this could be a good time for that long phone call abroad.  If caught short on credit or battery juice, feel free to use the in-house telephone system which staff use to call one another.  Dial ’9′ for an outside line, then reverse the charges.

Once you’ve settled into your comfy haven, cast your eyes around the shop floor.  Coolly wave strangers over to join you, particularly those you like the look of.  Put your feet up on a pouffe (if you’re so inclined).

If you like to unwind by watching television, then you’ll need to be more inventive.  Ask to try out a pair of binoculars, and ensure you have a clear line of sight to the audio-visual department.  Don’t like the programme that’s on?  You did remember to bring your “All in One” remote control, didn’t you?  Aim carefully, and zap away to your heart’s content (and turn up the volume so you can hear it).

Many of these “faux lounges” sport handy coffee tables to empty your pockets onto.  You don’t want loose change falling down the back of the sofa for another scamp to find, do you?  These low tables are also perfect for that stack of books and magazines you appropriated for the duration of your visit.

There’s no obvious, “acceptable” time limit to remain in your “virtual lounge”.  However, to make an untimely eviction less likely, consider wearing camouflage.  For those partial to trendy, black leather sofas, you’ll need to dress in similar fashion, like a “rock star”.  If camouflage is impractical, then try to sit very still like a mannequin.  This helps you to seem like “part of the furniture”.  Be sure not to fall asleep though, or you may wake up in the store-room.

Stay tuned for Part 2: The High Street Soup Kitchen, where we’ll wander over to the Kitchenware Department.

100 Strangers

I’m inspired by the 100 strangers project.  Here are my first two contributions (also on flickr).

Danka was wearing a striking orange jacket, which stood out well against the cool background.  As a fellow photographer, she was particularly interested in the project I was part of.

I saw this guy chopping wood, in advance of a very cold night.  He lives on his barge, and wasn’t surprised by my request for take a photo of him while he worked (“If you must – you get used to it living on the river”).  He told me that the local council are trying to make it harder for people to live the way he does.  Hopefully things won’t be too tough for him.

 

Men’s Haircuts

Men’s haircuts are a weird business.  Like having brain surgery once a month, except without the drugs.

Like every other time, it started off weird.  He put the funny plastic gown over my head, stood back, and spoke to my reflection.  ”So! What can I do for you today?”

This throws me every time.  I thought maybe I’d walked into doctor’s surgery by accident.  Then I saw the bottles of pastel-coloured male grooming products (which nobody buys), and knew I was in the right place.

“Use your imagination!” I wanted to say.  ”Look at how my hair looks now, subtract four weeks – now make it look like that!”

Men’s haircuts ought to be pretty simple.  Unless you’re a punk.

No, I understood.  He was afraid that one day I might change my mind.  That I might say, “Actually, I’ve turned to organised crime.  Shave it all off, and give me a razor-scar while you’re at it.”

I asked for a trim.  The barber replied, ‘ahhhh, a trim!’  As if that changed everything.  God forbid, if we hadn’t got that straight, he might have gone into left field and given me a tidy-up instead.

I guess it’s all part of the patter.  Having something to say to each other during this weird ritual.

My barber is pretty friendly.  He asks questions about my life.  But he’d stop cutting my hair while I was speaking.  I was there to get my hair cut, so I didn’t answer his questions very often.

Certainly, men’s haircuts should be pretty straightforward.  But there was the other side of the coin.  It could also be a ludicrously technical affair, with millimetre tolerances at stake.  When asking for a number three on the sides and back, I’d half-expect him to haul out a computer-guided industrial lathe.

Once the negotiations were over though, things didn’t get any easier.

I was captive in the barber’s chair, with a mirror straight ahead.  I couldn’t move a muscle, for fear of losing an ear.  So where should I look?

Straight ahead was out of the question: I’d be gazing flirtatiously into my own eyes.

Behind me, was the guy who was waiting.  It’d be even more weird to look at him.

Attempting to look nowhere in particular made me look all shifty.

So, I began to check out the little table in front of me.  You know, the little table with all the barbery things.  Scissors, and razors bathing in antiseptic, like something a brain surgeon might have.

How can there be so many kinds of scissors, I wondered.  There was a particularly funny-shaped pair, which looked like they could be used to make crinkle-cut crisps.

Suddenly, the barber yanked my head forward.  Now I was staring down at my crotch, while he attended to my neck stubble with a laser-guided guillotine.

I suppose I shouldn’t complain.  Sometimes they give you a free tissue on your way out.  Like a souvenir.  You only get this in the classier joints, whose coffee tables boast newspapers only two days old.

The best bit for me was the double-mirror trick at the end.  The bit where the barber holds up a second mirror, so I could admire his landscaping efforts on the back of my head.  Anything involving two mirrors is worthy of respect in my book.

I took one more look at all those scissors and scalpels, and acted impressed.  As if this number-two-fade was superior to all other number-two-fades I’ve had.

In truth, I couldn’t tell the difference.  But the risk of offending a man one step from a brain surgeon was too great, in my mind.

Testing on Autopilot

I was reminded of the power of automated testing by this talk by Rod Johnson.

http://www.infoq.com/presentations/system-integration-testing-with-spring

It is a little dated (2007), but what he says is still highly relevant.  The content mainly covers things we should already be practicing as developers, but it’s worth a reminder every now and then.

Following are the main points I took away from the presentation.

First, there are several key concepts to bear in mind.  These came up again and again in the talk:

  • Test Early, Test Often
  • Test at Multiple Levels
  • Automate Everything

Unit Testing

As developers, we know we should do lots of unit testing.  We do this by targeting classes in isolation, and mocking out collaborators.

To be clear, unit testing is looking at your class “in the lab”, not in the real world.  A unit test should not interact with Spring, or the database, or any infrastructure concerns.  Therefore, unit tests should run extremely fast: of the order of tens of thousands of tests per minute.  It shouldn’t be painful to run a suite of unit tests.

Do Test-Driven Development.  Not only does this help you discover APIs organically, but it’s a way of relieving stress.  Once a defect is detected, you can write a failing test for it, then come back to fix it later on.  The failing test is a big red beacon reminding you to finish the job.

Use tools such as Clover to measure the code-coverage of your tests.  80% is a useful rule of thumb.  Any more than this, and the benefits are not worth the cost.  Any less than 70%, and the risk of defects becomes significant.

Integration Testing

We should also do integration testing – for example to ensure our application is wired up correctly, and SQL statements are correct.  But how many of us are still clicking through flows in a browser?  Ad-hoc testing by deploying to an application server and clicking around is very time-consuming and error-prone.  If it’s not automated, chances are it won’t happen.  If it doesn’t happen or occurs late in the project cycle, defects will be expensive to fix.

So instead, maintain a suite of integration tests.  It should be possible to run hundreds or thousands of these per minute and again, they should be automated so they just happen.

Use Spring’s Integration Testing support.  Among other things, this provides superclasses which can perform each test in a transaction, and roll it back upon completion to avoid side-effects across tests.  This avoids the need to re-seed the database upon each test.

Another benefit of Spring Integration Testing is that the Spring context is cached between tests.  This means that the highly expensive construction of the Hibernate SessionFactory (if you use one) only happens once.  This context caching is usually impossible, because the test class is reconstructed by JUnit upon each test.

Remember to test behaviour in the database.  Stored procedures, triggers, views – regressions at the schema level should be caught early, in an automated fashion.

Integration tests should be deterministic – that is, they should not rely on the time of day, or random side-effects from previous tests.  This should be obvious, but when testing concerns such as scheduling, this can become difficult.  One strategy is to abstract out the concept of the current time of day.  This could be done by replacing a literal call to System.getCurrentTime() with a call to a private method.  This method would check for an override property set only during testing, the existence of which would cause a static Date to be returned to your application code.

Performance Testing

This should begin as early as possible.  Use scriptable frameworks such as The Grinder, so performance testing is cheap to execute early and often.  This means performance regressions will be caught immediately, for example if somebody drops an index.

Many performance problems are due to lack of understanding of ORM frameworks.  Learn to use your framework, for example relating to fetch strategies.  A common idiom is to eagerly fetch a collection of child entities up-front, rather than invoking the “N+1 Selects” problem by lazily loading each child record in a loop.  Additionally, consider evicting objects from the Session at appropriate points, to avoid memory overhead and to prevent the need for dirty-checking upon flushing of the Session.

One strategy to dive deeply into database performance concerns, is to enable SQL logging in your persistence framework.  A large number of SELECT statements per-use case will quickly become apparent.

Conclusion

Developers should Invest time into writing automated tests at multiple levels.  Even with a dedicated QA team in place, defects will only be caught early and fixed cheaply through an intelligent approach to automation.

Along with adoption of best practices such as Dependency Injection and separation of concerns, the industry has many tools on offer to make comprehensive testing cheap and easy.

References / Further Reading

JUnit: http://www.junit.org/
Spring Testing: http://static.springsource.org/spring/docs/2.5.x/reference/testing.html
The Grinder: http://grinder.sourceforge.net/
TDD (essay):  http://www.agiledata.org/essays/tdd.html
Clover (test coverage): http://www.atlassian.com/software/clover/
JWebUnit: http://jwebunit.sourceforge.net/

The Curse of the Splinternet

This week’s New Scientist featured a particularly fear-mongering article about internet security. Entitled “Age of the Splinternet”, it at first appears to be a hooray to the importance of net neutrality. But the subplot quickly becomes clear: the internet is a great place to be, but anything can and will go wrong, even fantastical, sci-fi doomsday scenarios …

The author begins with an illuminating history lesson on the structure of the internet. Dating back to the 1960s, the underlying system of routers was initially designed by the military as a fault-tolerant network, able to withstand a nuclear blast. The lack of central command, and the presence of autonomous nodes resulted in a decentralised, self-sustaining mesh, able to route traffic around a fault without human input.

The article goes on to praise the open, anonymous nature of the internet, making it difficult (though not impossible) for repressive regimes to censor the information their citizens can access.

Then the author shifts a gear, and a dark side to this openness is revealed. We are warned that companies such as Apple, Google and Amazon are starting to – can you imagine it – “fragment” the web to support their own products and interests. Perhaps it should not be such a shock that business and commerce continues to operate through self-interest, even on the internet.

There is no problem with the way Apple restrict the apps users can install: if they didn’t do it, someone else would. The motivation is that a large proportion of users want things to “just work”, and are happier on the “less choice, more reliability” side of the equation. The existence of the iPhone caused its flip-side to come into existence – Android – with its comparatively open policy. As long as business happens on the internet, it will want to manipulate things to make a profit; nothing new there. The “internet” as a whole is unblemished by that fact.

The cloud is described as a single point of failure. For example, when Amazon’s EC2 service croaked, businesses that relied on it were offline for the duration. Ignoring the fact that the definition of “the cloud” is precisely the reverse of a “single point” – that it is distributed and ought to be redundant (the EC2 outage was a bit of a freak occurence, due to a network engineer running the wrong command and taking out the whole farm), there is – again – no threat here. The internet – like any social system – will grow and evolve based on the demands of its users. EC2 makes up a tiny part of the internet and like any other service, has advantages and disadvantages, and nobody has to use it.

The very next paragraph does an about-turn by admitting that the cloud actually is distributed, spreading your data among many locations, and that this too is a problem. An example of this “threat” is the hack of RSA, which led to the intrusion of Lockheed-Martin’s computers. It’s not clear exactly what the problem is here, other than the fact that specialised groups of people depend on each other to get things done, and sometimes they mess up.

Beyond this point, we are taken on a ghost train ride into sheer speculation and technical folly. “Imagine being a heart patient and having your pacemaker hacked” the author warns. Even with the “evils” of the internet, we still know how to put up a firewall on our home PCs, let alone on medical devices. We may pick up a virus by randomly browing the web, but pacemakers will never be so general and will always be highly limited in functionality, through common sense and good engineering. If a pacemaker was somehow monitored over the internet, then it would not be hard to isolate this function from its critical control system.

If we’re not already quaking in our boots and reaching for the “off” switch on our broadband modems, a similar medical doomsday scenario is presented. What if the glucose levels of a diabetic patient is monitored and controlled over the internet? Wouldn’t it get hacked? Again, this is pure speculation out of the context of real constraints. The fault-tolerance levels of medical systems are far more stringent than those of general consumer devices. Not to mention the fact that nobody would design a system which gambles a human life on network availability.

Anonymity is blamed for the ability of online criminals to operate with impunity. I guess the humble balaclava, or simply “hiding from the police” aren’t sophisticated enough tactics to make it into New Scientist.

Are we offered any solutions to these terrors of the modern age? Actually, yes. The first is a good old “internet licence”, along with some kind of hardware identification system. Although the author recognises the technical challenges in such an idea, we must also consider the 100% likelihood of the system being circumvented by those it intends to control. DRM comes to mind as an example of an identification and control system which simply doesn’t work, and is prohibitively expensive to fix after the fact.

In summary, it would be fair to say that the article finds a balance between encouraging openness of the internet, and preventing misuse. But a prominent message prevails, which is that nonetheless, the internet is a dangerous place and therefore must be controlled.

The solution really is in basic common sense and best practices. Armed with fully patched software and treading wisely, there is no cause for concern. Business will continue to bend the internet to their own ends (as it does in the real world). Criminals will continue to attempt to exploit it (as they do in the real world). We can’t prevent these things from happening, but we can use our heads to drastically reduce our chances of going splat on the imformation superhighway.

Runtime Dependency Analysis

I was wondering: if I change class Foo, how do I determine 100% which use-cases to include in my regression tests?

It would be useful to know with 100% certainty that I must consider the Acme login process, as well as the WidgetCo webservice authentication.  And nothing else.

Can my IDE help me with this?  Well, in some cases it’s straightforward to analyse for backward dependencies.  If I change class Foo, then static analysis tells me that webservice WSFoo, and controller Bar are the only upstream entry points to your application affected by this change.  So you test those flows, and that’s about it.

But what if your application behaviour can change at run-time?

If you want to build products rapidly for your clients, you must be able to tweak functionality outside of the development cycle.  If customer Acme wants to insert a new step into their login process, you eventually want to be able to switch that on from an admin interface, not by changing the application code.  So for example, you could implement “hooks” in your generic flows, which pull dynamic behaviour in from outside, e.g. from a database.  This is nice, but it means you can no longer reason about dependencies, just by looking at the code.

So going back to the original question: how can you be sure a new code release won’t break any of your use-cases?

An interesting approach would be to develop a tool which learns about code coverage dynamically.  This tool would be configured with a list of use-cases, each of which maps to an “entry point” within your application.  For example, one entry point could be the method where the Acme login process first hits your application namespace, for example the doPost() method of a Servlet.

The tool would then be loaded into the JVM alongside your application and would wait for a “hit” on an entry point.  Once it detects a hit, it would record the following stack trace, until the entry point exits.

The information recorded by this tool would tell you exactly which downstream components – e.g. classes or external services – are dependencies of the flow being recorded.  By reversing these mappings, you can instantly tell which flows will be impacted when you change a component.

You could just drop this tool into your UAT server, and let it “discover” this information, as long as customers are testing.  The presence of this tool would encourage a wide variety of tests – to “train” the tool as comprehensively as possible.

How would the recorded coverage information be presented?  A simple approach would be a tree of auto-generated, static HTML files – much like the output of the javadoc tool.  These pages could either be created on the filesystem, or hosted over HTTP.

How would the tool interact with the application being recorded?  Naturally your application should have minimal awareness of the tool.  Ideally, a jar file could be dropped into the classpath, and bootstrapped with a Servlet Listener, for example.

Still, the “entry points” in your application need to be defined somehow.  One idea is to annotate entry points in your application at method-level.  The tool would then need to scan for these annotations at bootstrap time.  This could probably be achieved by using a bytecode manipulation library such as Javassist, to avoid loading all the classes prematurely.  The Scannotation library provides a ready-rolled solution for this.

So once your application is running, how would the resulting “stack trace” be captured?  Possibly, JVMTI could be used to capture this information, like how a profiler does.  Ideally, it would be possible to whitelist (or blacklist) by package name, so only relevant information is recorded.

One caveat is that this “profiling” could hit performance within your application.  This could be minimised by flushing the recordings (e.g. to a file) asynchronously.  For example, Chronon (a “time travelling debugger”) uses this approach, utilising background threads to periodically flush its buffers to a file.

Another downside of this approach is that the tool’s “knowledge” is only as comprehensive as the coverage which occurs during the training period.  A blunt response to this issue would be to simply deploy the tool to production.  This way, as well as building a more comprehensive picture of runtime dependencies, interesting statistics would emerge.  A “one-stop shop” would result, where Service Delivery teams could determine which functionality customers are using, as well as timing information for example.

In summary, a tool like this would raise confidence in software quality, increase visibility of coverage, and also reduce pain caused by manually hunting through code for usages.  Perhaps it could also offer an interesting birds-eye view of your application usage patterns, too.