Ten reasons why access link bonding doesn’t work

Can you take multiple network access links, like Wi-Fi and 3G, and bond them together to deliver an enhanced user experience? The common assumption is that if you take the bandwidth of both lines, then by adding them you get something more and better.

This unexpectedly turns out to deliver a much worse user experience and application outcomes than anticipated. The reasons are rather instructive, at least for those who are interested in network design and performance. It illustrates why the naïve “pipe” mental models people use to think about networks are deeply misleading.

The fundamental reason access link bonding doesn’t work is that “quality” is what determines application outcomes, not notional “bandwidth”. This “quality” network property is not composable in the hoped-for way. When you try, it works OK at low loads – when you don’t care anyway about additional capacity.

So what about at high loads? As you rack up the offered load to the network, you reach the capacity and/or schedulability limits of the system. (See slides 56 onwards for a reminder of what that means.) At that point, bad things happen. Those bad things quickly outweigh the benefits.

Here’s why:

  1. Bonding multiple paths does nothing to improve latency and packet service time. Indeed, if you spread packets across links with heterogeneous properties, then some of your packets from each flow now experience the worst-case link. So your basic performance constraint of round trip time is not improved, and its (undesirable) variability increases.
  2. When an application sends composite packet flows down both paths (e.g. video+audio), you get the worst quality of the two paths, degrading the performance of applications that might have used the faster path alone. For instance, you might lose lip sync.
  3. You introduce a new failure mode, as when either path fails, even momentarily, it creates a performance hazard. Then one of two things can happen. Either you have to accept more service glitches – as some data gets lost – or you try to recover the lost data. If you try to recover, the other path becomes rammed with packets. This creates an outage that may last much longer than the transient failure.
  4. Standard network QoS techniques don’t won’t well on single links, let alone on bonded systems. That means you have added a whole new layer of hazards and complexity to your flow control and applications. These grow in nasty non-deterministic ways, adding an unhealthy dose of unpredictability to application performance.
  5. You have a new form of chaotic systems behaviour as one path saturates and other does not. Cue more unpredictability and unhappy users.
  6. When packets take different paths, and arrive out-of-order, TCP treats this like a loss and backs off. Hence you lose the effective throughput you thought you had gained: performance goes backwards. Are you still sure this bonding thing is a good idea?
  7. Applications tested on single paths may not adapt properly to when the flows go over multiple paths, causing unwanted feedback effects (e.g. adaptive-rate coding oscillates wildly).
  8. While it may seem that bonding paths together increases resilience, using this for higher speed increases the risk of failure events. Imagine sending jigsaw puzzles by post: if you send all the pieces together in one heavy bulk surface mail package, occasionally you lose a whole puzzle. Those that arrive come complete, albeit slowly. But if you post all the pieces separately by express airmail courier to gain “speed”, you often get left with an incomplete jigsaw. If what you want is a whole jigsaw, you’ve not gained anything.
  9. Pre-defining which application traffic will go in which direction means that each application gets an unmanaged service, with no (quality) assurance. The only benefit is to slightly reduce the self-contention with the other application traffic that is now sent on the other path. So if your application is sensitive to the effects of contention (as many are), you’ve hardly gained any benefit, but have doubled your access costs in the process.
  10. You introduce an inconsistent response to users, who may value consistency of application experience over absolute speed. Be careful what you wish for, because it may not be what creates value.
There’s a meta-message here. Lots of “performance” improvements in networks have perverse effects. TCP offload engines, content delivery networks, “better” control protocols – all can go wrong in unexpected ways. Networks are complex beasts with unexpected emergent properties. Before tinkering with their performance, you are well advised to consult a professional.

To keep up to date with the latest fresh thinking on telecommunication, please sign up for the Geddes newsletter