Why broadband speed tests suck

Everyone is familiar with broadband ‘speed test’ applications. When you have an ISP service quality problem, it is common to grab one of these tools to see if you are getting the service you feel you are entitled to get.

Whenever you upgrade your broadband, the first thing you might do is to run a speed test. Then you can show off to your mates how superior your blazing fast service is compared to their pitiful product.

The whole broadband industry is based on hawking ‘lines’ to its ‘speed’-addicted ‘users’. The trouble with broadband is that the performance can be ‘cut’ from over-sharing the network. Buyers are naturally concerned with the ‘purity’ of the resulting product versus its claims of potency. These ‘speed test’ tools purport to fill that quality assurance role.

The resulting situation is not good for the wellbeing of users. Selling ‘speed’ is also not a particularly ethical way for ISPs to make a living. So why do speed tests suck so badly? Let me tell you…

Misrepresents the service to users

Speed test applications implicitly aim to tell users something about the fitness-for-purpose of the service on offer. However, they are only a weak proxy for the experience of any application other than speed testing itself.

By its nature, a speed test tells you some (peak) rate at which data was transferred when the network is (over-)saturated. It is nearly always expressed as a ‘per second’ measure. That means it doesn’t capture the instantaneous effects of the service, as a lot of packets can pass by in a second. Yet the user experience solely comprises the continuous passing of those instantaneous moments of service delivery.

It is like having a report on pizza home delivery service that tells you that pizzas arrived within 10 minutes of coming out of the oven, when averaged over a year. That averaging could hide an hour-long wait on Saturday evenings which makes the service unfit for purpose. Merely reporting on average quantity misses essential information about the quality of the service.

Confuses capacity with other factors

The ‘speed’ figure reported is the result of the interaction of many different technological elements, all of which contribute to the number presented to the user. It is a long list!

These factors include: the speed tester client code (e.g. JavaScript), the browser (and its execution environment), the OS and its TCP stack implementation, any virtualisation layers of the OS and connectivity (e.g. VPNs, VLANs), the customer LAN or WiFi (and any local activity, like backups to an Apple Time Capsule), the router or home gateway, security filters and firewalls, the physical access line (which may be wholesaled), the retail ISP network, IP interconnect, and the hosted speed tester service (server hardware, OS and any virtualization, and network stack). Apologies if I have missed anything.

What happens is that we pretend that the final ‘speed’ presented is the result of a single one of those factors, namely the ‘capacity’ of the ISP service (including the access link, where vertically integrated). Even that falsely treats all the internal complexity of that ISP as if it were a circuit-like ‘pipe’.

More subtly, the ISP ‘speed’ being measured is an emergent phenomenon as all the platforms, protocols and packets interact within the network. The ISP doesn’t even have control over the result that they are being held responsible for!

Speed tests are costly

What we are doing when we run a ‘speed test’ is a small-scale denial of service attack on the network. That’s not a big deal when individual users do it on rare occasions. When ISPs themselves deploy thousands of testing boxes running lots of tests all the time it becomes a significant proportion of the load on the network.

This has a cost in terms of the user experience of other applications running at the same time. One person’s speed test is another person’s failed Skype call. Speed testing is a pretty antisocial way of going about using a shared resource. This is especially true for physically shared access media like wireless and coax cable. For FTTx users it doesn’t take many simultaneous speed testers to create local performance problems.

The high load of speed testing applications also directly drives the need for capacity upgrades. My colleagues have seen networks where there is an uncanny correlation between where the ISP’s own speed testing boxes are placed, and where the capacity planning rules are triggered to drive capital spending!

Wrongly optimises networks

By focusing marketing claims on peak data transfer rates, speed testing encourages network operators and ISPs to optimise their services for this purpose. The difficulty is that we live in a finite world with finite resources, so this must involve trade-offs. When we optimise for long-lived data transfers, we may pessimise for real-time and interactive services.

We have seen much concern and engineering effort expended over the phenomenon called ‘bufferbloat’. This is when we have large buffers that create spikes of delay as queues build up. As ISPs have tuned their services to meet their marketing promises for speed, they have taken their eye off other matters (i.e. packet scheduling) that are of equal or more importance to the overall user experience.

Doesn’t even tell the truth

You might get the impression that we’re a bit down on speed tests as a measurement method for networks. Well, it gets worse. They don’t even accurately report the peak transfer rate!

What happens is that you can get packets arrive out of order, for instance when they take different routes. Then the network protocol stack ‘holds back’ a bunch of packets from the application until the missing one arrives. Then it releases them all in one go when the missing turns up. They all suddenly appear at once to the speed testing application, which then reports an amazing burst in speed. This reported number may even greatly exceed the maximum line rate, and be physically impossible.

So the peak transfer rate the speed test application reports can include artefacts of the protocols and control processes involved. The number you get can be a total fib, and you have no way of knowing.

Drives false marketing claims

As Seth Godin notes, “All marketers are liars”. The act of marketing is to present the product to the user in most favourable light when compared to your competition. As ISPs compete over marketing claims for peak speed, they naturally like to report the peak of all peaks on their advertising hoardings.

This sets up a false expectation with users that they are entitled to achieve these best of all best case data rates. Furthermore, the users feel that ‘speed test’ applications are an accurate representation of the peak data rate of their line and the performance of all applications. This then leads to a blame game in which the ISP is accused of not fulfilling its service obligation promise. The ISP has no means of isolating the cause of the poor reported speed test results, or the poor performance of any other application. This drives huge dissatisfaction and churn.

The ISP industry is already battling technically incompetent ‘net neutrality’ regulation. Adding inept claims of the service capability, and encouraging use of misleading reporting tools to measure this, doesn’t help the cause of rallying public opinion to your side.

What’s the alternative?

If unthinking pursuit of speed tests is unwise, what could be a better approach?

To answer this, we need to go back to fundamentals. The application performance on offer is a result of the packet loss and delay (and nothing else). So why not measure that? The good news is that we now have non-intrusive, scalable and cheap to deploy methods of doing just this.

Want high-fidelity and low-impact service quality measurement? Then let’s set up a time to talk!