Uncategorized

JUnit and Non-Daemon Threads

Normally in Java, if the main thread starts one or more non-daemon threads, the Java process will not terminate until the last non-daemon thread terminates.

Yet, I was surprised to find that a particular JUnit test completed normally, despite never calling shutdown() on a ThreadPoolExecutor it had started. No Java process was left behind. This was the case both when running the test from within IntelliJ and also from Maven (using the surefire plugin). Replicating the test code in a vanilla main() method led to the expected behaviour: a “hanging” process.

So what was going on? Surely something fascinating and enlightening, right? Running the JUnit test from my IDE revealed the underlying java invocation in the Console pane (abbreviated):

java -classpath "some-massive-classpath" com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 MyTest,someMultiThreadedTest

So, the main method which launches the JUnit tests is in the class called JUnitStarter, which is an internal class within IntelliJ. A quick look at the code for JUnitStarter reveals the answer is very simple: an explicit call to System.exit() before main() returns. Maven Surefire’s ForkedBooter does the same thing.

As always, some strange behaviour turns out to be something entirely simple! But this is something to watch out for. Ideally, unit tests wouldn’t test multithreaded code (rather, they would test logic which is abstracted from the surrounding threaded environment). But if you must test multi-threaded production code, then be aware that your tests could give a misleading positive result in cases such as this.

Banishing Bufferbloat

Bufferbloat - it’s making our internet slow. But what is it?

After reading this thought-provoking article about bufferbloat, I wanted to do two things: have a better understanding of the concept, and find evidence of its occurrence within my own set-up.

The term ‘bufferbloat’ was coined by Jim Gettys in 2010 as an explanation of much of today’s internet congestion, which can lead to very poor performance over an apparently “high bandwidth” network connection. In this article I will attempt to explain bufferbloat in a way accessible to those who are not network professionals.

Disclaimer: I am not a network professional either; I simply enjoy researching things.  This article is purely an attempt to digest what I’ve learned, and hopefully pass on something interesting to others. I will also document how I solved one particular instance of the problem in my own network.

The internet and indeed, any system of connected components - is made up of communication channels, each capable of a particular throughput. This can be visualised as a network of interconnected pipes, all of varying widths. Any point where a “large pipe” (high bandwidth) feeds into a smaller one (low bandwidth) can become a bottleneck when traffic levels are high.

To be clear, this situation of interconnected links with varying bandwidths is normal - for example where a backbone link carrying national traffic feeds into a smaller network servicing a particular set of subscribers. Usually the subset of traffic coming through the bottleneck would not usually exceed that which the small pipe can service, otherwise the situation would clearly be inadequate.

However, temporary spikes in traffic during unusually busy periods can occur. At this point, one of two things can happen. Either the excess traffic is stored up in a buffer (a “holding area”) for the duration of the spike, or else the narrower link must reject the excess traffic as there’s nowhere for it to go.

In the first scenario, the excess traffic would slowly fill the buffer for the duration of the spike. The buffer would be drained into the smaller pipe as fast as can be supported. Once traffic levels return to normal, the buffer would empty back to its normal level. The upstream components would not be aware of this situation, as they would not experience any rejected traffic (dropped packets).

However, if the traffic spike is prolonged, then the buffer becomes full, and the situation is similar to that where no buffer exists: packets are dropped.

From the upstream producer’s point of view, the packet would need to be re-sent (as no acknowledgement was received). The re-sending process would continue whilst the bottleneck is in effect, and would appear as a slow (or stalled) data transfer.

To be clear, these buffers are good to have. In the early days of the internet (c. 1986), buffers were insufficiently sized. This led to heavy packet loss during times of even moderate contention, to the point where most of the traffic was retransmitted packets. This was clearly inadequate, and so the use of larger buffers was recommended. Importantly, congestion control algorithms were also brought into play in each link which transmits data. These algorithms attempt to detect the size of the downstream pipe by slowly ramping up traffic to the point where no packets are dropped.

So where’s the problem? The problem surfaces when the size of buffers is set too high. A buffer is just an area of memory, and as memory has become cheap, buffers have become larger, without adequate consideration of the consequences. A buffer which is too large gives a false indication that a bottlenecked pipe is bigger than it really is. If a very large buffer is in use, then your data transfer is simply filling this buffer, making the pipe look bigger than it really is. The buffer doesn’t even serve its original purpose, as it is permanently full.

Why is this bad? If you’re doing a large upload (for example, sending a video to YouTube or backing up music to cloud storage) where an oversized transmit buffer is present, then web pages may appear to load very slowly (many seconds). The reason is that the tail-end of the large upload is sat in a large queue. A request to Google would sit at the back of the queue, and would have to wait until the buffer is emptied before it is sent on to the next link.

The solution is to tune the size of the buffer, such that it is only used to absorb temporary spikes in traffic, rather than giving false indications of high bandwidth during periods of contention. To be fair, the real solution is fairly complex, involving Active Queue Management to signal the onset of congestion so the rate of flow can be backed off before the buffer becomes full.

In many cases, these buffers exist in network equipment (such as routers) which is controlled by ISPs and similar organisations, but there are places under your own control where you can identify and fix this phenomenon. For my own situation, the issue was that during a large backup of files from my netbook to another computer on my network, it was virtually impossible to do anything else network-related on the netbook. During a large file upload to another computer on my LAN, a very slow wireless connection is a permanent bottleneck, with an observed effective throughput of 400kB/s (shown by the scp command), or 3Mbps.

By default, Linux allocates a transmit buffer maximum size of about 3MB (obtained via the following command, which gives minimum, default and maximum memory for the TCP transmit buffer):

sysctl -a | grep net.ipv4.tcp_wmem

If I start off a large upload and watch the size of this transmit buffer, the tx_queue settles at around 1.7MB. This value was obtained via:

cat /proc/net/tcp

1.7MB of data was permanently sat in the buffer; this would take around 4 seconds to drain over a 400kB/s network link. So any requests for web pages whilst the transfer is going on will be sat in a 4 second queue. Not good. This setting certainly needed to be tweaked in my case. Setting it too low however would result in small windows of data being sent per-roundtrip, which would prevent TCP from ever ramping up to full throughput.

The article quoted earlier suggests the recommended buffer size is the Bandwidth Delay Product. This is the bottleneck bandwidth, multiplied by the delay (or latency) that packets in the buffer take to reach their destination.

So, my buffer size of 1.7MB with a latency of 1ms (over my home network) correlates to an imaginary bandwidth of 1.7MB/s, or around 14Mbps (in contrast to the real bottleneck bandwidth of around 3Mbps). So, the TCP transmit buffer was five times too large for my particular environment. Setting the TCP transmit buffer size to the approximately correct size of around 256Kb mostly fixed the problem. I settled for a figure of 128Kb - on my system this is a good compromise between bandwidth for large uploads, and latency for other interactive activity such as browsing or SSHing. This setting can be changed by editing /etc/sysctl (the interface into kernel parameters).

Follow this with a refresh of the parameters, and you’re done:

sudo sysctl -p

Caveat: Your own mileage certainly may vary if you choose to tweak these settings. You’d be mad to do this on anything important without knowing exactly what you’re doing.

Note: There are a number of articles which suggest increasing the size of the network buffers in Linux, using a similar approach.  Based on my understanding and experiences, this is fine if raw bandwidth is your goal, and particularly if you have a healthy upstream bandwidth.  If you don’t have this bandwidth, then setting these buffers too high could harm your interactive network activity, while being unable to improve utilisation in an already saturated link.

Clueless? Just Improvise.

“Go on then, make us laugh.”

Those words would probably make anybody curl up and die.  Many of us would freeze up, go into our heads and try to think of a clever joke.  We often think being funny is a special skill reserved for ingenious stand-up comedians, or witty wordsmiths like Stephen Fry.

Not so for an Improv Comedian - they’d be more likely to do the first zany thing that comes into their head.  And hey presto, it’ll probably be funny. Unlike stand-up, Improv Comedy involves short scenes made up on the fly, often with instructions from off-stage to change scenes or characters mid-flight.  When it’s impossible to plan ahead, spontaneity and total participation rule over being clever or witty.

Improv is often funny because of the eccentric and unexpected performances that happen in the heat of the moment.  As an audience, we can’t help but laugh in relief or recognition, as the performers first appear to be in dire straits, but then dredge up a convincing scene seemingly from nothing.

Can you learn this stuff?  Apparently you can - over the last couple of weeks I’ve been taking classes with Steve Roe of Hoopla.  His workshops attract everybody from rigid newbies (such as myself), to experienced actors and bona-fide theatre types. As a software engineer and logic junkie, spontaneity feels like a great skill to unlock within myself.  

Public speaking groups such as Toastmasters have helped me to feel comfortable speaking to a group, but speaking off-the-cuff requires a different bag of tricks.  Improv has taught me that convincing scenes often develop out of thin air, as long as the group is totally present and heading in the same direction.

The Hoopla workshops usually start with simple, fun warm-ups to build a safe, supportive atmosphere.  Next come specific skill-building exercises, where we take turns to act out scenes in small groups. There are many techniques, but one of the most fundamental is called “Yes, and”.  If my partner tells me that “This is the best biscuit I’ve ever tasted!” and I reply with “What biscuit?”, then I’d be denying their contribution.  With the “Yes, and” mentality I might reply “Yes and that’s the last one, you greedy pig!”  

When there’s an agreed reality, the scene gains traction. Mike Myers successfully “yes ands” a mischievous James Lipton in this exchange: JL: “Ants and caterpillars can be - in certain circumstances - delicious.” MM: “Yes, and I had them yesterday.” JL: “You had them yesterday?  Here’s a strange coincidence - so did I.” MM: “Yes I know, because I was across the street watching you.” JL: “It’s very odd because I was eating in my bathroom.” MM: “Yes, and I was in the medicine cabinet.”

The “Yes, and” technique is a way to avoid mistakes.  But even when mistakes do occur, Improv performers take them in their stride.  In fact, “mistakes” don’t even exist in Improv - they are simply “an offer which hasn’t yet been acknowledged”.  These offers turn into a “game” between the performers which is much more fun than a straight scene.

During one workshop, we formed pairs, and acted a straight scene such as a job interview.  As soon as the first mistake happened - for example, when someone says something inconsistent - we’d stop.  We’d acknowledge the mistake by turning it into a “game” occurring within the scene.  

For example, the “what biscuit?” mistake from earlier could have gone another way: Mary: “This is the best biscuit I’ve ever tasted!” John: “What biscuit?” Mary: (Unperturbed) “It doesn’t go so well with this tea though.  I prefer Digestives for dunking.” John: (Continuing the game) “Tea? Where? What are you talking about?” Mary: “I think you should try some.  Here let me pour some into your cup - ” John: “What cup? - whoa!” (Mimes being scalded by boiling water) The “game” is that John denies the existence of anything Mary says.  Mary uses John’s mistake as an offer, and eventually “traps” him.  

Mistakes like this are a fantastic way to generate material.  Pre-planned, logical thinking would never have arrived at the same result. These two techniques only scratch the surface.  Each idea we learn feels like a rediscovery of Things That Already Work - in everyday life as well as on the stage. In fact, learning improv has felt like an “unlearning” of sorts.  The creative, spontaneous part of the brain seems to work best when given space to work unimpeded.  Planning, preparation and self-criticism are thrown to the wind, and the result is fun and sometimes even hilarious.

The High Street Chill-Out Zone

Pssst - all you new-age vagrants out there.  

Ever fancied the comfort of a cosy lounge for free, right on the High Street? If you’re thinking “coffee shops”, then think again.  Unless you cherish hustling for a space amid used napkins and oozings of toffee-nut latte, to sit on plywood shaped like a rudimentary chair, whilst the din of industrial coffee grinders compete with the shrieks of spoiled toddlers … if you cherish that, then go right ahead.

For a more homely experience, rock up to your local Department Store.  These often have a furniture section, containing mock-ups of living rooms in various styles.  Simply turn up any time during opening hours, choose the sofa you like best, and make yourself at home.

Be sure to have everything you need before you arrive.  Newspaper, flask of coffee or soup, hot water bottle.  A pet (live or stuffed) makes a cute, cuddly addition (especially if still warm). Check your phone battery is fully charged - this could be a good time for that long phone call abroad.  If caught short on credit or battery juice, feel free to use the in-house telephone system which staff use to call one another.  Dial ‘9’ for an outside line, then reverse the charges.

Once you’ve settled into your comfy haven, cast your eyes around the shop floor.  Coolly wave strangers over to join you, particularly those you like the look of.  Put your feet up on a pouffe (if you’re so inclined).

If you like to unwind by watching television, then you’ll need to be more inventive.  Ask to try out a pair of binoculars, and ensure you have a clear line of sight to the audio-visual department.  Don’t like the programme that’s on?  You did remember to bring your “All in One” remote control, didn’t you?  Aim carefully, and zap away to your heart’s content (and turn up the volume so you can hear it).

Many of these “faux lounges” sport handy coffee tables to empty your pockets onto.  You don’t want loose change falling down the back of the sofa for another scamp to find, do you?  These low tables are also perfect for that stack of books and magazines you appropriated for the duration of your visit.

There’s no obvious, “acceptable” time limit to remain in your “virtual lounge”.  However, to make an untimely eviction less likely, consider wearing camouflage.  For those partial to trendy, black leather sofas, you’ll need to dress in similar fashion, like a “rock star”.  If camouflage is impractical, then try to sit very still like a mannequin.  This helps you to seem like “part of the furniture”.  

Be sure not to fall asleep though, or you may wake up in the store-room. Stay tuned for Part 2: The High Street Soup Kitchen, where we’ll wander over to the Kitchenware Department.

Stanford ArtificiaI Intelligence MOOC

I’m proud to have completed the first ever offering of the MOOC in Artificial Intelligence, run by Sebastian Thrun and Peter Norvig through Stanford University.

It was intriguing, challenging, and ultimately fun to get a first bit of working knowledge of things like spam filters, robot localization, and computer vision.

I’ve written a little Bayes filter based on the model introduced in that course.  I’ve hooked it up to my IRC client to alert me about the most interesting messages.  As they say though, the hardest part is the training and data collection - it’s hard and time-consuming to come up with enough good data to form a workable model.

The main things I’ve gained from the course is an appreciation of the kinds of problems AI can solve, as well as an idea of what tool to use in a given situation.