Software

This is going to be a bit of a dry post unfortunately. I want to document a problem I encountered with my home network, and what I learned diagnosing and solving it. It relates to Wireguard, Network Address Translation, network interfaces and packet captures. So pre-warned, let’s get into it!

How do you compare two images for similarity? One way is by hashing them using something like JImageHash. Libraries such as this reduce an image to a much smaller binary hash. The idea is that when hashing two images which look similar (but aren’t identical), the two hashes will be very similar. It’s possible then to get a ‘similarity score’ by finding how many bits in the two binary hashes are different.

Too long ago in 2004, this website started. I vividly recall the satisfaction when I typed www.benrowland.net into a browser, and it worked. I ran the website on Tomcat running in a DOS window on my Windows desktop, so it wasn’t quite a production-grade deployment. I had pointed my domain name to the IP address allocated to my home computer by my ISP, and there was something magical about watching the internet routing happen the way I’d hoped it would.

When upgrading a hobby Spring Boot project to Java 11, there were a few issues which reminded me of how closely related Gradle, the Spring Boot plugin, and the java version you’re using are. Naively trying a gradlew bootRun on the project (which was using Gradle 3.1) resulted in this: Could not determine java version from '11.0.5' Fortunately I’d seen that one before, so I upgraded my version of Gradle to 5.

As software developers, we often want to talk to services over the internet, usually using HTTP. However, it’s now very common to see online services using HTTPS – an extension of HTTP which enables secure communications over a network. This move has made life more interesting for developers who want to interact with these services. Most of the time, connecting to a host over HTTPS “just works” from Java, but sometimes things don’t quite work … and Java can be a bit cryptic in how it reports the failure.

Chain of Trust When a Java program connects to a host over HTTPS, it’s important to know you’re really communicating with who you think you are. If I write a program which connects to https://www.example.com, then I’m sending information over a secure channel. How can I be sure I’m not talking to a malicious third party who is somehow intercepting my traffic, such as user credentials? The identity of a server is proven with certificates.

Ciphers So far, we’ve looked at how certificates can support the Authentication property of HTTPS. The certificate also enables a second property of HTTPS – Encryption of the traffic. This encryption is possible because the certificate contains a public key which allows clients to encrypt data, so only those holding the corresponding private key (the website owner) will be able to decrypt. This encryption is achieved using ciphers. There may be a range of ciphers available on both ends of the connection, and the cipher chosen for the communication will be agreed in the initial TLS handshake.

Perfect Forward Secrecy At the present time, with the possibility of data breaches and digital eavesdropping, privacy is an important subject. What’s to prevent a malicious third-party from collecting traffic going to or from a website? It might seem adequate to use HTTPS to encrypt traffic to a website, meaning it’s impossible to decrypt the traffic unless you also have access to the private key. But what if a malicious actor collects encrypted traffic over a long period of time?

Normally in Java, if the main thread starts one or more non-daemon threads, the Java process will not terminate until the last non-daemon thread terminates.

Yet, I was surprised to find that a particular JUnit test completed normally, despite never calling shutdown() on a ThreadPoolExecutor it had started. No Java process was left behind. This was the case both when running the test from within IntelliJ and also from Maven (using the surefire plugin). Replicating the test code in a vanilla main() method led to the expected behaviour: a “hanging” process.

So what was going on? Surely something fascinating and enlightening, right? Running the JUnit test from my IDE revealed the underlying java invocation in the Console pane (abbreviated):

java -classpath "some-massive-classpath" com.intellij.rt.execution.junit.JUnitStarter -ideVersion5 MyTest,someMultiThreadedTest

So, the main method which launches the JUnit tests is in the class called JUnitStarter, which is an internal class within IntelliJ. A quick look at the code for JUnitStarter reveals the answer is very simple: an explicit call to System.exit() before main() returns. Maven Surefire’s ForkedBooter does the same thing.

As always, some strange behaviour turns out to be something entirely simple! But this is something to watch out for. Ideally, unit tests wouldn’t test multithreaded code (rather, they would test logic which is abstracted from the surrounding threaded environment). But if you must test multi-threaded production code, then be aware that your tests could give a misleading positive result in cases such as this.

Bufferbloat - it’s making our internet slow. But what is it?

After reading this thought-provoking article about bufferbloat, I wanted to do two things: have a better understanding of the concept, and find evidence of its occurrence within my own set-up.

The term ‘bufferbloat’ was coined by Jim Gettys in 2010 as an explanation of much of today’s internet congestion, which can lead to very poor performance over an apparently “high bandwidth” network connection. In this article I will attempt to explain bufferbloat in a way accessible to those who are not network professionals.

Disclaimer: I am not a network professional either; I simply enjoy researching things. This article is purely an attempt to digest what I’ve learned, and hopefully pass on something interesting to others. I will also document how I solved one particular instance of the problem in my own network.

The internet and indeed, any system of connected components - is made up of communication channels, each capable of a particular throughput. This can be visualised as a network of interconnected pipes, all of varying widths. Any point where a “large pipe” (high bandwidth) feeds into a smaller one (low bandwidth) can become a bottleneck when traffic levels are high.

To be clear, this situation of interconnected links with varying bandwidths is normal - for example where a backbone link carrying national traffic feeds into a smaller network servicing a particular set of subscribers. Usually the subset of traffic coming through the bottleneck would not usually exceed that which the small pipe can service, otherwise the situation would clearly be inadequate.

However, temporary spikes in traffic during unusually busy periods can occur. At this point, one of two things can happen. Either the excess traffic is stored up in a buffer (a “holding area”) for the duration of the spike, or else the narrower link must reject the excess traffic as there’s nowhere for it to go.

In the first scenario, the excess traffic would slowly fill the buffer for the duration of the spike. The buffer would be drained into the smaller pipe as fast as can be supported. Once traffic levels return to normal, the buffer would empty back to its normal level. The upstream components would not be aware of this situation, as they would not experience any rejected traffic (dropped packets).

However, if the traffic spike is prolonged, then the buffer becomes full, and the situation is similar to that where no buffer exists: packets are dropped.

From the upstream producer’s point of view, the packet would need to be re-sent (as no acknowledgement was received). The re-sending process would continue whilst the bottleneck is in effect, and would appear as a slow (or stalled) data transfer.

To be clear, these buffers are good to have. In the early days of the internet (c. 1986), buffers were insufficiently sized. This led to heavy packet loss during times of even moderate contention, to the point where most of the traffic was retransmitted packets. This was clearly inadequate, and so the use of larger buffers was recommended. Importantly, congestion control algorithms were also brought into play in each link which transmits data. These algorithms attempt to detect the size of the downstream pipe by slowly ramping up traffic to the point where no packets are dropped.

So where’s the problem? The problem surfaces when the size of buffers is set too high. A buffer is just an area of memory, and as memory has become cheap, buffers have become larger, without adequate consideration of the consequences. A buffer which is too large gives a false indication that a bottlenecked pipe is bigger than it really is. If a very large buffer is in use, then your data transfer is simply filling this buffer, making the pipe look bigger than it really is. The buffer doesn’t even serve its original purpose, as it is permanently full.

Why is this bad? If you’re doing a large upload (for example, sending a video to YouTube or backing up music to cloud storage) where an oversized transmit buffer is present, then web pages may appear to load very slowly (many seconds). The reason is that the tail-end of the large upload is sat in a large queue. A request to Google would sit at the back of the queue, and would have to wait until the buffer is emptied before it is sent on to the next link.

The solution is to tune the size of the buffer, such that it is only used to absorb temporary spikes in traffic, rather than giving false indications of high bandwidth during periods of contention. To be fair, the real solution is fairly complex, involving Active Queue Management to signal the onset of congestion so the rate of flow can be backed off before the buffer becomes full.

In many cases, these buffers exist in network equipment (such as routers) which is controlled by ISPs and similar organisations, but there are places under your own control where you can identify and fix this phenomenon. For my own situation, the issue was that during a large backup of files from my netbook to another computer on my network, it was virtually impossible to do anything else network-related on the netbook. During a large file upload to another computer on my LAN, a very slow wireless connection is a permanent bottleneck, with an observed effective throughput of 400kB/s (shown by the scp command), or 3Mbps.

By default, Linux allocates a transmit buffer maximum size of about 3MB (obtained via the following command, which gives minimum, default and maximum memory for the TCP transmit buffer):

sysctl -a | grep net.ipv4.tcp_wmem

If I start off a large upload and watch the size of this transmit buffer, the tx_queue settles at around 1.7MB. This value was obtained via:

cat /proc/net/tcp

1.7MB of data was permanently sat in the buffer; this would take around 4 seconds to drain over a 400kB/s network link. So any requests for web pages whilst the transfer is going on will be sat in a 4 second queue. Not good. This setting certainly needed to be tweaked in my case. Setting it too low however would result in small windows of data being sent per-roundtrip, which would prevent TCP from ever ramping up to full throughput.

The article quoted earlier suggests the recommended buffer size is the Bandwidth Delay Product. This is the bottleneck bandwidth, multiplied by the delay (or latency) that packets in the buffer take to reach their destination.

So, my buffer size of 1.7MB with a latency of 1ms (over my home network) correlates to an imaginary bandwidth of 1.7MB/s, or around 14Mbps (in contrast to the real bottleneck bandwidth of around 3Mbps). So, the TCP transmit buffer was five times too large for my particular environment. Setting the TCP transmit buffer size to the approximately correct size of around 256Kb mostly fixed the problem. I settled for a figure of 128Kb - on my system this is a good compromise between bandwidth for large uploads, and latency for other interactive activity such as browsing or SSHing. This setting can be changed by editing /etc/sysctl (the interface into kernel parameters).

Follow this with a refresh of the parameters, and you’re done:

sudo sysctl -p

Caveat: Your own mileage certainly may vary if you choose to tweak these settings. You’d be mad to do this on anything important without knowing exactly what you’re doing.

Note: There are a number of articles which suggest increasing the size of the network buffers in Linux, using a similar approach. Based on my understanding and experiences, this is fine if raw bandwidth is your goal, and particularly if you have a healthy upstream bandwidth. If you don’t have this bandwidth, then setting these buffers too high could harm your interactive network activity, while being unable to improve utilisation in an already saturated link.

Software

Hairpin NATs and Martian Packets

Hamming Weight Trees

How it started vs. How it's going - www.benrowland.net

Spring Boot, Gradle and Java 11

HTTPS and Java - Pitfalls and Best Practices - Part 1

HTTPS and Java - Pitfalls and Best Practices - Part 2

HTTPS and Java - Pitfalls and Best Practices - Part 3

HTTPS and Java - Pitfalls and Best Practices - Part 4

JUnit and Non-Daemon Threads

Banishing Bufferbloat