Chinatown (Liam Gallagher Cover)

I never thought of myself as a Liam Gallagher fan, but I like his latest album “As You Were”.

Since I play guitar and even sing a little, I’ve been learning to play some songs from the album. Several things came out of this. First, Liam sings really high! He was asked about this in an interview - specifically whether men would be able to sing along - and he just said something straightforward like, well they should just stop complaining and crank it out anyway. I’m paraphrasing from memory - he probably swore a few times.

Second - in general, it’s really hard to play and sing at the same time. The rhythm for the guitar part will often be different than the vocals. So you have to kind of split yourself into two parts, each doing its own thing.

I can just about manage “Chinatown” - here’s the first verse and chorus!

JUnit and Non-Daemon Threads

Normally in Java, if the main thread starts one or more non-daemon threads, the Java process will not terminate until the last non-daemon thread terminates.

Yet, I was surprised to find that a particular JUnit test completed normally, despite never calling shutdown() on a ThreadPoolExecutor it had started. No Java process was left behind. This was the case both when running the test from within IntelliJ and also from Maven (using the surefire plugin). Replicating the test code in a vanilla main() method led to the expected behaviour: a “hanging” process.

So what was going on? Surely something fascinating and enlightening, right? Running the JUnit test from my IDE revealed the underlying java invocation in the Console pane (abbreviated):

java -classpath "some-massive-classpath" com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 MyTest,someMultiThreadedTest

So, the main method which launches the JUnit tests is in the class called JUnitStarter, which is an internal class within IntelliJ. A quick look at the code for JUnitStarter reveals the answer is very simple: an explicit call to System.exit() before main() returns. Maven Surefire’s ForkedBooter does the same thing.

As always, some strange behaviour turns out to be something entirely simple! But this is something to watch out for. Ideally, unit tests wouldn’t test multithreaded code (rather, they would test logic which is abstracted from the surrounding threaded environment). But if you must test multi-threaded production code, then be aware that your tests could give a misleading positive result in cases such as this.

Banishing Bufferbloat

Bufferbloat - it’s making our internet slow. But what is it?

After reading this thought-provoking article about bufferbloat, I wanted to do two things: have a better understanding of the concept, and find evidence of its occurrence within my own set-up.

The term ‘bufferbloat’ was coined by Jim Gettys in 2010 as an explanation of much of today’s internet congestion, which can lead to very poor performance over an apparently “high bandwidth” network connection. In this article I will attempt to explain bufferbloat in a way accessible to those who are not network professionals.

Disclaimer: I am not a network professional either; I simply enjoy researching things.  This article is purely an attempt to digest what I’ve learned, and hopefully pass on something interesting to others. I will also document how I solved one particular instance of the problem in my own network.

The internet and indeed, any system of connected components - is made up of communication channels, each capable of a particular throughput. This can be visualised as a network of interconnected pipes, all of varying widths. Any point where a “large pipe” (high bandwidth) feeds into a smaller one (low bandwidth) can become a bottleneck when traffic levels are high.

To be clear, this situation of interconnected links with varying bandwidths is normal - for example where a backbone link carrying national traffic feeds into a smaller network servicing a particular set of subscribers. Usually the subset of traffic coming through the bottleneck would not usually exceed that which the small pipe can service, otherwise the situation would clearly be inadequate.

However, temporary spikes in traffic during unusually busy periods can occur. At this point, one of two things can happen. Either the excess traffic is stored up in a buffer (a “holding area”) for the duration of the spike, or else the narrower link must reject the excess traffic as there’s nowhere for it to go.

In the first scenario, the excess traffic would slowly fill the buffer for the duration of the spike. The buffer would be drained into the smaller pipe as fast as can be supported. Once traffic levels return to normal, the buffer would empty back to its normal level. The upstream components would not be aware of this situation, as they would not experience any rejected traffic (dropped packets).

However, if the traffic spike is prolonged, then the buffer becomes full, and the situation is similar to that where no buffer exists: packets are dropped.

From the upstream producer’s point of view, the packet would need to be re-sent (as no acknowledgement was received). The re-sending process would continue whilst the bottleneck is in effect, and would appear as a slow (or stalled) data transfer.

To be clear, these buffers are good to have. In the early days of the internet (c. 1986), buffers were insufficiently sized. This led to heavy packet loss during times of even moderate contention, to the point where most of the traffic was retransmitted packets. This was clearly inadequate, and so the use of larger buffers was recommended. Importantly, congestion control algorithms were also brought into play in each link which transmits data. These algorithms attempt to detect the size of the downstream pipe by slowly ramping up traffic to the point where no packets are dropped.

So where’s the problem? The problem surfaces when the size of buffers is set too high. A buffer is just an area of memory, and as memory has become cheap, buffers have become larger, without adequate consideration of the consequences. A buffer which is too large gives a false indication that a bottlenecked pipe is bigger than it really is. If a very large buffer is in use, then your data transfer is simply filling this buffer, making the pipe look bigger than it really is. The buffer doesn’t even serve its original purpose, as it is permanently full.

Why is this bad? If you’re doing a large upload (for example, sending a video to YouTube or backing up music to cloud storage) where an oversized transmit buffer is present, then web pages may appear to load very slowly (many seconds). The reason is that the tail-end of the large upload is sat in a large queue. A request to Google would sit at the back of the queue, and would have to wait until the buffer is emptied before it is sent on to the next link.

The solution is to tune the size of the buffer, such that it is only used to absorb temporary spikes in traffic, rather than giving false indications of high bandwidth during periods of contention. To be fair, the real solution is fairly complex, involving Active Queue Management to signal the onset of congestion so the rate of flow can be backed off before the buffer becomes full.

In many cases, these buffers exist in network equipment (such as routers) which is controlled by ISPs and similar organisations, but there are places under your own control where you can identify and fix this phenomenon. For my own situation, the issue was that during a large backup of files from my netbook to another computer on my network, it was virtually impossible to do anything else network-related on the netbook. During a large file upload to another computer on my LAN, a very slow wireless connection is a permanent bottleneck, with an observed effective throughput of 400kB/s (shown by the scp command), or 3Mbps.

By default, Linux allocates a transmit buffer maximum size of about 3MB (obtained via the following command, which gives minimum, default and maximum memory for the TCP transmit buffer):

sysctl -a | grep net.ipv4.tcp_wmem

If I start off a large upload and watch the size of this transmit buffer, the tx_queue settles at around 1.7MB. This value was obtained via:

cat /proc/net/tcp

1.7MB of data was permanently sat in the buffer; this would take around 4 seconds to drain over a 400kB/s network link. So any requests for web pages whilst the transfer is going on will be sat in a 4 second queue. Not good. This setting certainly needed to be tweaked in my case. Setting it too low however would result in small windows of data being sent per-roundtrip, which would prevent TCP from ever ramping up to full throughput.

The article quoted earlier suggests the recommended buffer size is the Bandwidth Delay Product. This is the bottleneck bandwidth, multiplied by the delay (or latency) that packets in the buffer take to reach their destination.

So, my buffer size of 1.7MB with a latency of 1ms (over my home network) correlates to an imaginary bandwidth of 1.7MB/s, or around 14Mbps (in contrast to the real bottleneck bandwidth of around 3Mbps). So, the TCP transmit buffer was five times too large for my particular environment. Setting the TCP transmit buffer size to the approximately correct size of around 256Kb mostly fixed the problem. I settled for a figure of 128Kb - on my system this is a good compromise between bandwidth for large uploads, and latency for other interactive activity such as browsing or SSHing. This setting can be changed by editing /etc/sysctl (the interface into kernel parameters).

Follow this with a refresh of the parameters, and you’re done:

sudo sysctl -p

Caveat: Your own mileage certainly may vary if you choose to tweak these settings. You’d be mad to do this on anything important without knowing exactly what you’re doing.

Note: There are a number of articles which suggest increasing the size of the network buffers in Linux, using a similar approach.  Based on my understanding and experiences, this is fine if raw bandwidth is your goal, and particularly if you have a healthy upstream bandwidth.  If you don’t have this bandwidth, then setting these buffers too high could harm your interactive network activity, while being unable to improve utilisation in an already saturated link.

An Appetite for Combinatorics

It’s common to see “find the number of possibilities” problems in Computer Science. This kind of problem stems from Discrete Maths - an important pre-requisite for doing anything beyond the trivial, for example Cryptography or Graph Theory.

I found one of these problems on Project Euler.  Project Euler is a collection of mathematically-inclined programming problems - probably more than you could ever solve in a lifetime (some of them are still unsolved by anybody). The particular problem which drew my attention doesn’t actually require any programming to solve.  

The problem is based on the idea of finding routes between two points on a grid:

Starting in the top left corner of a 2x2 grid, there are 6 routes (without backtracking) to the bottom right corner. 
How many routes are there through a 20x20 grid?

This is pretty fundamental maths, but I find these kind of techniques are always worth re-visiting, as it seems to be a case of “use it or lose it”.

Following is my approach, so don’t read any further if you want to try it yourself first!

I started by drawing a tree structure for the 2x2 grid, where each node had two choices = ‘R’ or ‘D’ (for go Right, or Down).  This gave me a feel for things.  Towards the end of some paths, there was clearly some pruning - where the only option is to head for the goal (rather than back-tracking or going out of bounds).

It then became clear that any plan for getting to the goal simply involved two Rs and two Ds.  You clearly need to take two steps Right, and two steps Down to reach the goal, whatever your route.  So the problem can be re-stated as “how many ways are there of arranging two Rs and two Ds?”  Or more vividly: “If I have a bag containing two Kit-Kats and two Mars Bars, how many distinct ways can I eat them in sequence?”

Of course, the stated problem involves twenty each of Kit-Kats and Mars Bars.  So if I was really hungry, how many ways could I eat them all? Suitably motivated, it’s time for some fun with combinatorics.  

For the moment, let’s go back to the 2x2 grid, and ignore the repetition of Right and Down moves.  This means we must take four distinct steps to reach the goal.  So let’s assume that we have a bag of four chocolate bars - all different.  How many ways can we draw them in sequence?  Or more properly, how many permutations are there?

For the first choice, we have four options.  Once we’ve made this first selection, we have three left to choose from.  Then two, and finally there’s only one left.  This naturally leads us to the factorial function:

4! = 4 x 3 x 2 x 1 = 24

So there are 24 ways (permutations) to draw four tasty, chocolate treats. Now let’s amend our calculation, taking into account that two of the chocolate bars are identical.  Say, two Milky Ways, one Kit-Kat, and one Mars Bar. This is easy to work out - out of our 24 original permutations, we need to omit the repeated permutations of the two identical items.  There are 2! (2 x 1 = 2) ways to arrange two chocolate bars, so we adjust our answer for this.

4!/2! = 12 permutations

Now, it’s only one more step to re-discover the example solution, by taking into account that there are two classes of two identical ‘objects’ (Right moves and Down moves), and so we end up with:

4!/(2!*2!) = 6 permutations.

Now it’s really easy to solve the stated problem - I won’t give away the solution of course!

Clueless? Just Improvise.

“Go on then, make us laugh.”

Those words would probably make anybody curl up and die.  Many of us would freeze up, go into our heads and try to think of a clever joke.  We often think being funny is a special skill reserved for ingenious stand-up comedians, or witty wordsmiths like Stephen Fry.

Not so for an Improv Comedian - they’d be more likely to do the first zany thing that comes into their head.  And hey presto, it’ll probably be funny. Unlike stand-up, Improv Comedy involves short scenes made up on the fly, often with instructions from off-stage to change scenes or characters mid-flight.  When it’s impossible to plan ahead, spontaneity and total participation rule over being clever or witty.

Improv is often funny because of the eccentric and unexpected performances that happen in the heat of the moment.  As an audience, we can’t help but laugh in relief or recognition, as the performers first appear to be in dire straits, but then dredge up a convincing scene seemingly from nothing.

Can you learn this stuff?  Apparently you can - over the last couple of weeks I’ve been taking classes with Steve Roe of Hoopla.  His workshops attract everybody from rigid newbies (such as myself), to experienced actors and bona-fide theatre types. As a software engineer and logic junkie, spontaneity feels like a great skill to unlock within myself.  

Public speaking groups such as Toastmasters have helped me to feel comfortable speaking to a group, but speaking off-the-cuff requires a different bag of tricks.  Improv has taught me that convincing scenes often develop out of thin air, as long as the group is totally present and heading in the same direction.

The Hoopla workshops usually start with simple, fun warm-ups to build a safe, supportive atmosphere.  Next come specific skill-building exercises, where we take turns to act out scenes in small groups. There are many techniques, but one of the most fundamental is called “Yes, and”.  If my partner tells me that “This is the best biscuit I’ve ever tasted!” and I reply with “What biscuit?”, then I’d be denying their contribution.  With the “Yes, and” mentality I might reply “Yes and that’s the last one, you greedy pig!”  

When there’s an agreed reality, the scene gains traction. Mike Myers successfully “yes ands” a mischievous James Lipton in this exchange: JL: “Ants and caterpillars can be - in certain circumstances - delicious.” MM: “Yes, and I had them yesterday.” JL: “You had them yesterday?  Here’s a strange coincidence - so did I.” MM: “Yes I know, because I was across the street watching you.” JL: “It’s very odd because I was eating in my bathroom.” MM: “Yes, and I was in the medicine cabinet.”

The “Yes, and” technique is a way to avoid mistakes.  But even when mistakes do occur, Improv performers take them in their stride.  In fact, “mistakes” don’t even exist in Improv - they are simply “an offer which hasn’t yet been acknowledged”.  These offers turn into a “game” between the performers which is much more fun than a straight scene.

During one workshop, we formed pairs, and acted a straight scene such as a job interview.  As soon as the first mistake happened - for example, when someone says something inconsistent - we’d stop.  We’d acknowledge the mistake by turning it into a “game” occurring within the scene.  

For example, the “what biscuit?” mistake from earlier could have gone another way: Mary: “This is the best biscuit I’ve ever tasted!” John: “What biscuit?” Mary: (Unperturbed) “It doesn’t go so well with this tea though.  I prefer Digestives for dunking.” John: (Continuing the game) “Tea? Where? What are you talking about?” Mary: “I think you should try some.  Here let me pour some into your cup - ” John: “What cup? - whoa!” (Mimes being scalded by boiling water) The “game” is that John denies the existence of anything Mary says.  Mary uses John’s mistake as an offer, and eventually “traps” him.  

Mistakes like this are a fantastic way to generate material.  Pre-planned, logical thinking would never have arrived at the same result. These two techniques only scratch the surface.  Each idea we learn feels like a rediscovery of Things That Already Work - in everyday life as well as on the stage. In fact, learning improv has felt like an “unlearning” of sorts.  The creative, spontaneous part of the brain seems to work best when given space to work unimpeded.  Planning, preparation and self-criticism are thrown to the wind, and the result is fun and sometimes even hilarious.

The High Street Chill-Out Zone

Pssst - all you new-age vagrants out there.  

Ever fancied the comfort of a cosy lounge for free, right on the High Street? If you’re thinking “coffee shops”, then think again.  Unless you cherish hustling for a space amid used napkins and oozings of toffee-nut latte, to sit on plywood shaped like a rudimentary chair, whilst the din of industrial coffee grinders compete with the shrieks of spoiled toddlers … if you cherish that, then go right ahead.

For a more homely experience, rock up to your local Department Store.  These often have a furniture section, containing mock-ups of living rooms in various styles.  Simply turn up any time during opening hours, choose the sofa you like best, and make yourself at home.

Be sure to have everything you need before you arrive.  Newspaper, flask of coffee or soup, hot water bottle.  A pet (live or stuffed) makes a cute, cuddly addition (especially if still warm). Check your phone battery is fully charged - this could be a good time for that long phone call abroad.  If caught short on credit or battery juice, feel free to use the in-house telephone system which staff use to call one another.  Dial ‘9’ for an outside line, then reverse the charges.

Once you’ve settled into your comfy haven, cast your eyes around the shop floor.  Coolly wave strangers over to join you, particularly those you like the look of.  Put your feet up on a pouffe (if you’re so inclined).

If you like to unwind by watching television, then you’ll need to be more inventive.  Ask to try out a pair of binoculars, and ensure you have a clear line of sight to the audio-visual department.  Don’t like the programme that’s on?  You did remember to bring your “All in One” remote control, didn’t you?  Aim carefully, and zap away to your heart’s content (and turn up the volume so you can hear it).

Many of these “faux lounges” sport handy coffee tables to empty your pockets onto.  You don’t want loose change falling down the back of the sofa for another scamp to find, do you?  These low tables are also perfect for that stack of books and magazines you appropriated for the duration of your visit.

There’s no obvious, “acceptable” time limit to remain in your “virtual lounge”.  However, to make an untimely eviction less likely, consider wearing camouflage.  For those partial to trendy, black leather sofas, you’ll need to dress in similar fashion, like a “rock star”.  If camouflage is impractical, then try to sit very still like a mannequin.  This helps you to seem like “part of the furniture”.  

Be sure not to fall asleep though, or you may wake up in the store-room. Stay tuned for Part 2: The High Street Soup Kitchen, where we’ll wander over to the Kitchenware Department.

Men's Haircuts

Like every other time, it started off weird.  He put the funny plastic gown over my head, stood back, and spoke to my reflection.  ”So! What can I do for you today?”

This throws me every time.  I thought maybe I’d walked into doctor’s surgery by accident.  Then I saw the bottles of pastel-coloured male grooming products (which nobody buys), and knew I was in the right place. “Use your imagination!” I wanted to say.  ”Look at how my hair looks now, subtract four weeks - now make it look like that!”

Men’s haircuts ought to be pretty simple.  Unless you’re a punk. No, I understood.  He was afraid that one day I might change my mind.  That I might say, “Actually, I’ve turned to organised crime.  Shave it all off, and give me a razor-scar while you’re at it.”

I asked for a trim.  The barber replied, ‘ahhhh, a trim!’  As if that changed everything.  God forbid, if we hadn’t got that straight, he might have gone into left field and given me a tidy-up instead.

I guess it’s all part of the patter.  Having something to say to each other during this weird ritual. My barber is pretty friendly.  He asks questions about my life.  But he’d stop cutting my hair while I was speaking.  I was there to get my hair cut, so I didn’t answer his questions very often.

Certainly, men’s haircuts should be pretty straightforward.  But there was the other side of the coin.  It could also be a ludicrously technical affair, with millimetre tolerances at stake.  When asking for a number three on the sides and back, I’d half-expect him to haul out a computer-guided industrial lathe.

Once the negotiations were over though, things didn’t get any easier. I was captive in the barber’s chair, with a mirror straight ahead.  I couldn’t move a muscle, for fear of losing an ear.  So where should I look? Straight ahead was out of the question: I’d be gazing weirdly into my own eyes. Behind me, was the guy who was waiting.  It’d be even more weird to look at him. Attempting to look nowhere in particular made me look all shifty. So, I began to check out the little table in front of me.  You know, the little table with all the barbery things.  Scissors, and razors bathing in antiseptic, like something a brain surgeon might have. How can there be so many kinds of scissors, I wondered.  There was a particularly funny-shaped pair, which looked like they could be used to make crinkle-cut crisps.

Suddenly, the barber yanked my head forward.  Now I was staring down at my crotch, while he attended to my neck stubble with a laser-guided guillotine. I suppose I shouldn’t complain.  Sometimes they give you a free tissue on your way out.  Like a souvenir.  You only get this in the classier joints, whose coffee tables boast newspapers only two days old.

The best bit for me was the double-mirror trick at the end.  The bit where the barber holds up a second mirror, so I could admire his landscaping efforts on the back of my head.  Anything involving two mirrors is worthy of respect in my book.

I took one more look at all those scissors and scalpels, and acted impressed.  As if this number-two-fade was superior to all other number-two-fades I’ve had. In truth, I couldn’t tell the difference.  But the risk of offending a man carrying a cut-throat razor was too great, in my mind.

Stanford ArtificiaI Intelligence MOOC

I’m proud to have completed the first ever offering of the MOOC in Artificial Intelligence, run by Sebastian Thrun and Peter Norvig through Stanford University.

It was intriguing, challenging, and ultimately fun to get a first bit of working knowledge of things like spam filters, robot localization, and computer vision.

I’ve written a little Bayes filter based on the model introduced in that course.  I’ve hooked it up to my IRC client to alert me about the most interesting messages.  As they say though, the hardest part is the training and data collection - it’s hard and time-consuming to come up with enough good data to form a workable model.

The main things I’ve gained from the course is an appreciation of the kinds of problems AI can solve, as well as an idea of what tool to use in a given situation.

Testing on Autopilot

I was reminded of the power of automated testing by this talk by Rod Johnson, the original creator of the Spring framework. It is a little dated (2007), but what he says is still highly relevant.  The content mainly covers things we should already be practicing as developers, but it’s worth a reminder every now and then. Following are the main points I took away from the presentation.

First, there are several key concepts to bear in mind.  These came up again and again in the talk:

  • Test Early, Test Often
  • Test at Multiple Levels
  • Automate Everything

Unit Testing

As developers, we know we should do lots of unit testing.  We do this by targeting classes in isolation, and mocking out collaborators.

To be clear, unit testing is looking at your class “in the lab”, not in the real world.  A unit test should not interact with Spring, or the database, or any infrastructure concerns.  Therefore, unit tests should run extremely fast: of the order of tens of thousands of tests per minute.  It shouldn’t be painful to run a suite of unit tests.

Do Test-Driven Development. Not only does this help you discover APIs organically, but it’s a way of relieving stress.  Once a defect is detected, you can write a failing test for it, then come back to fix it later on.  The failing test is a big red beacon reminding you to finish the job.

Use tools such as Clover to measure the code-coverage of your tests.  80% is a useful rule of thumb.  Any more than this, and the benefits are not worth the cost.  Any less than 70%, and the risk of defects becomes significant.

Integration Testing

We should also do integration testing - for example to ensure our application is wired up correctly, and SQL statements are correct.  

But how many of us are still clicking through flows in a browser?  Ad-hoc testing by deploying to an application server and clicking around is very time-consuming and error-prone.  If it’s not automated, chances are it won’t happen.  If it doesn’t happen or occurs late in the project cycle, defects will be expensive to fix.

So instead, maintain a suite of integration tests.  It should be possible to run hundreds or thousands of these per minute and again, they should be automated so they just happen.

Use Spring’s Integration Testing support.  Among other things, this provides superclasses which can perform each test in a transaction, and roll it back upon completion to avoid side-effects across tests.  This avoids the need to re-seed the database upon each test.

Another benefit of Spring Integration Testing is that the Spring context is cached between tests.  This means that the highly expensive construction of the Hibernate SessionFactory (if you use one) only happens once.  This context caching is usually impossible, because the test class is reconstructed by JUnit upon each test.

Remember to test behaviour in the database.  Stored procedures, triggers, views - regressions at the schema level should be caught early, in an automated fashion.

Integration tests should be deterministic - that is, they should not rely on the time of day, or random side-effects from previous tests.  This should be obvious, but when testing concerns such as scheduling, this can become difficult.  One strategy is to abstract out the concept of the current time of day.  This could be done by replacing a literal call to System.getCurrentTime() with a call to a private method.  This method would check for an override property set only during testing, the existence of which would cause a static Date to be returned to your application code.

Performance Testing

This should begin as early as possible.  Use scriptable frameworks such as The Grinder, so performance testing is cheap to execute early and often.  This means performance regressions will be caught immediately, for example if somebody drops an index.

Many performance problems are due to lack of understanding of ORM frameworks.  Learn to use your framework, for example relating to fetch strategies.  A common idiom is to eagerly fetch a collection of child entities up-front, rather than invoking the “N+1 Selects” problem by lazily loading each child record in a loop.  Additionally, consider evicting objects from the Session at appropriate points, to avoid memory overhead and to prevent the need for dirty-checking upon flushing of the Session.

One strategy to dive deeply into database performance concerns, is to enable SQL logging in your persistence framework.  A large number of SELECT statements per-use case will quickly become apparent.

Conclusion

Developers should Invest time into writing automated tests at multiple levels.  Even with a dedicated QA team in place, defects will only be caught early and fixed cheaply through an intelligent approach to automation. Along with adoption of best practices such as Dependency Injection and separation of concerns, the industry has many tools on offer to make comprehensive testing cheap and easy.

References / Further Reading

Runtime Dependency Analysis

I was wondering: if I change class Foo, how do I determine 100% which use-cases to include in my regression tests? It would be useful to know with 100% certainty that I must consider the Acme login process, as well as the WidgetCo webservice authentication. And nothing else. Can my IDE help me with this? Well, in some cases it’s straightforward to analyse for backward dependencies. If I change class Foo, then static analysis tells me that webservice WSFoo, and controller Bar are the only upstream entry points to your application affected by this change.